Video

## Kinovea 0.8.25

I’m happy to announce the general availability of Kinovea 0.8.25.

This article describes some of the changes in version 0.8.25 over version 0.8.24.
This release focuses on usability and polishing of existing features, and introduces one new feature in the Capture module.

## 1. General

Starting with version 0.8.25 a native x64 build is provided. There are now 4 download options. The zip files are the portable versions and will run self-contained in the extraction directory. The exe files are the installer versions.
The minimum requirements have not changed and Kinovea still runs under all Windows versions between Windows XP and Windows 10.

The interface is now translated to Arabic thanks to Dr. Mansour Attaallah from the Faculty of Physical Education, Alexandria University – Egypt.

## 2. File explorer

#### Thumbnails details

The details overlaid on the thumbnails have been extended and made configurable. The framerate and creation time have been added to the fields that can be displayed, the framerate is displayed by default. Right-click the empty space in the explorer to bring the thumbnails context menu and choose the fields you would like to be shown.

## 3. Playback module

The video now updates immediately when moving the playback cursor. This behavior was previously only activated when the working zone was entirely loaded in memory. It is now enabled by default. The experience should be largely improved but if you are on a less powerful system and navigation is problematic, the behavior of the cursor can be reverted from Preferences > Playback > General > “Update image during time cursor movement”.

#### Video framerate

The internal framerate of the video can be customized from the bottom part of the dialog in Video > Configure video timing. This setting changes the “default” framerate of the video by overriding what is written in the file. This is a different concept than slow motion. What the setting does is redefine the nominal speed of the video, the 100%. This is useful when a video has a wrong framerate embedded in it which can happen sometimes. In general use you would not use this setting very often but it can save an odd file. Note that this setting is also not the same as the Capture framerate that can be set from the same configuration box.

## 4. Annotation tools

#### Named objects

All drawing tool instances (angles, arrows, markers, chronometers, etc.) now have a custom “Name” property. This makes it easier to match drawings with their value when exporting data to spreadsheet. Regarding spreadsheet export, all lines and point markers are now exported to the spreadsheet, whether or not they have the “Display measure” option active in Kinovea.

#### Custom length unit

A new custom length unit can be used to cover use-cases that are not natively supported by Kinovea. By default Kinovea natively supports Millimeters, Centimeters, Meters, Inches, Feet and Yards. The extra option can be used to define a new unit such as Micrometers or Kilometers depending on the scale of the video being analyzed, or any unit specific to your field. The default value for this option is “Percentage (%)”. The percentage unit would make sense when analyzing dimensions of objects purely relatively to one reference object. The mapping between video pixels and real life dimensions in the custom unit is defined by a calibration line, or a calibration grid for non-orthogonal planes. Any line or grid can be used as the calibration object.

The unit is defined in Preferences > Playback > Units > Custom length unit. It can then be used in any line or grid during calibration.

#### Default tracking parameters

A default tracking profile can be defined from Preferences > Drawings > Tracking. This profile will be applied by default to newly added tracks and trackable custom tools like the bikefit tool or the goniometer. The parameters can be expressed in percentage of the image size or in actual pixels. Note that in the case of tracks, the tracking profile can also be modified on a per-object basis after addition. This is not currently possible for other objects.

## 5. Capture module

#### File naming automation

The file naming engine has been rewritten from scratch to support a variety of automation scenarios that were not previously well supported. The complete path of captured files is configured from Preferences > Capture > Image naming and Preferences > Capture > video naming.

A complete path is constructed by the concatenation of three top-level values: a root directory, a sub directory and the file name. It is possible to define a different value for these three top-level variables for the left and right screens and for images and videos. The sub directory can stay empty if you do not need this level of customization. Defining root directories on different physical drives for the left and right screens can improve recording performances by parallelizing the writing.

The sub directory and the file name can contain “context variables” that are automatically replaced just in time when saving the file. These variables start with a % sign followed by a keyword. In addition to date and time components you can use the camera alias, the configured framerate and the received framerate in the file name.

The complete list of context variable and the corresponding keyword can be found by clicking the “%” button next to the text boxes.

A few examples:

 Root: "C:\Users\joan\Documents" Sub directory: "Kinovea\%year\%year%month\%year%month%day" File: "%year%month%day-%hour%minute%second" 

Result: “C:\Users\joan\Documents\Kinovea\2016\201608\20160815\20160815-141127.jpg”
 Root: "D:\videos\training\joan" Sub directory: File: "squash - %camalias - %camfps" 

Result: “D:\videos\training\joan\squash – Logitech HD Pro Webcam C920 – 30,00.mp4”

If the file name component does not contain any variable, Kinovea will try to find a number in it and automatically increment it in preparation for the next video so as not to disrupt the flow during multi-attempts recording sessions.

#### Capture mosaic

The capture mosaic is a new feature introduced in Kinovea 0.8.25. It uses the buffer of images supporting the delay feature as a source of images and display several images from this buffer simultaneously on the screen. The result is a collection of video streams coming from the same camera but slightly shifted in time or running at different framerates. The capture mosaic can be configured by clicking the mosaic button in the capture screen:

Modes:

1. The single view mode corresponds to the usual capture mode: a single video stream is presented, shifted in time by the value of the delay slider.

2. The multiple views mode will split the video stream and present the action shifted in time a bit further for each stream. For example if the delay buffer can contain 100 images (this depends on the image size and the memory options) and the mosaic is configured to show 4 images, then it will show:

• the real time image;
• a second image from 33 frames ago;
• another one from 66 frames ago;
• and a fourth one from 100 frames ago.

Each quadrant will continue to update and show its own delayed stream. This can be helpful to get several opportunities to review a fast action.

3. The slow motion mode will split the video stream and present the action in slow motion. Each stream runs at the same speed factor. In order to provide continuous slow motion the streams have to periodically catch up with real time. Having several streams allows you to get continuous slow motion in real time.

4. The time freeze mode will split the video stream and show several still images taken from the buffer. The images are static and the entire collection will synchronize at once, providing a new frozen view of the motion.

## 6. Feedback

Please post your feedback, bug reports, usability issues, feature suggestions, etc. on the forum in the following thread.

VR

## Work in progress – Light field rendering in VR

This is a work in progress update related to my second Light field render engine. The first one was described last october in the video Implementing a Light Field Renderer. It was based in part on Aaron Isaksen paper “Dynamically Reparameterized Light Fields”. The view synthesis was performed by reprojecting carefully selected quads from the source images (see video). It ran on the CPU and projected the result in a desktop window.

## Scope

For this second engine the scope of the project is the following.

#### 1. View synthesis on the GPU, display in VR

The implementation is in CUDA and outputs to the Oculus Rift (DK2 and runtime 0.8 in this version). While I could get away with 30 to 50 ms to render sub-HD images in the previous project, VR requires the total time to render both eyes be under 10 ms, and the resolution is much higher.

#### 2. Quality vs ray budget optimizations

The particular constraint I’m using to drive the project is to try to get the best quality out of a given budget in Megarays. I settled on a 100-Megaray budget for the experimental phase. This is medium size, for comparison the Lytro Illum physical Light field camera clocks 40 Megarays, while one of the example in my previous project was 900 Megarays. Gigaray-range light fields are often encountered for omnidirectional applications. Note that the examples in the video below are 300 Megarays, I still haven’t matched the quality I would like to reach for the 100 Mr budget.

#### 3. Close-up with limited depth range

I’m focusing on close-up subjects with shallow depth.

## Demo

There are a few important things that are not reflected in the video, some good some bad.

1. Resolution: The video shows the mirror window of the application which uses a much reduced resolution than the actual view projected in VR (see performances paragraph below).
2. Scale and depth: completely missing from the video, as always the case for VR, is the good sense of scale and physicality of the subject. The head is rendered in VR at the actual physical size of the head of the person, which is a bit unsettling at first, in a good way.
3. Uncanny valley: The quality is there (at respectable distance) but the fact that the person is completely still is awkward. Due to this, it looks more like a photograph with depth than an actual person being there. A short breathing and blinking animation loops would go a very long way.
4. Eye accomodation failure: the person’s head is well-locked in the VR world thanks to positional tracking, and the quality is beliveable. Due to this, there is an expectation of increased definition when moving closer to the face. The eyes tries to refocus progressively when getting closer, but the quality is capped from the source dataset. This gives a strange feeling that your eyes don’t work. I had not experienced this before.

## Dataset details

• 300 Megarays.
• Original model is 768K polygons, with 8K texture.
• Capture: path tracing at 1000 samples/pixels.

#### Crystal

• 300 Megarays.
• Capture: path tracing at 4000 samples/pixels. (And a specular depth of 48 because the crystal has many internal surfaces that the light must traverse and bounce on).

Both light fields were captured in Octane 3.0 alpha using a custom Lua script.

## Performances

The rendered view is 2364×2927 per eye. This is a pixel density of 2.0x on the Oculus DK2 with the eye relief I’m using. The images are then warped to the DK2 screen which is 1920×1080. It runs comfortably within the 13 ms (75 fps) budget on a GTX 980Ti.
Additionally the view is interpolated and blit to the low-resolution mirror window. This step happens just before posting the render to the Oculus compositor.

Obviously the most interesting thing about rendering light fields is that the render time is independent from the complexity of the scene. This is why I focus on examples that are hard or downright impossible to render in real time. The woman face is more than 750K polygons, uses a complex material and 8K textures. The crystal has particularly challenging light pathways and caustics (not really visible with the settings I used unfortunately) that can only be rendered realistically using ray tracing.

## Application

The application in the video is called Hypercapsule. It draws some parts from Capsule, the application from this post which renders omnistereo images in the Oculus DK2. The “hyper” comes from the 4D mindset used in core parts of the implementation.
I’m not going to publish this application at this time, it’s a step towards something larger.

VR

## CUDA-based omni-stereo image viewer for Oculus Rift

Project Capsule : a small viewer for omnidirectional stereoscopic images. The target device is the Oculus Rift, the image reprojection is done in CUDA. Download and discussion thread at Oculus forums.

## 1. Introduction

Capsule is a small viewer for omnidirectional stereoscopic images. The target device is the Oculus Rift, the image reprojection is done in CUDA.

This project was started to scratch a triple itch :

1. I wanted to experience the images created for the “Render the Metaverse” contest on my DK2 ;
2. I wanted to be able to very quickly check for stitching artifacts, binocular rivalry, depth and eye comfort in any omni-stereo image ;
3. I wanted to improve my CUDA and GPGPU programming skills.

With that in mind, I set myself up to build a small omni-stereo renderer in CUDA so that I could project these images on the Rift and learn a few things along the way.

The Render the Metaverse contest has been organized by OTOY this summer and has given birth to the most impressive collection of fully spherical stereoscopic images to date. Many of the works are excellent and leverage the ability to create photorealistic images of imaginary or impossible worlds. Impossible-yet-photorealistic is also what I loved to do back in my photo-remix days, it’s a really powerful strategy to awe the viewer.

Here is a screenshot of the desktop-side of the software.

Fig. 1. Capsule omni-stereo image viewer desktop window.

The Headset side of the software is just the full spherical images. There is no in-VR user interface whatsoever.

## 2. Project scope

I have voluntarily limited the scope of the project to be able to work everything out in a relatively short period.

1. Static vs Dynamic: The program is limited to static content. This is to focus on the peak quality content without having to deal with frame queues, buffering and other joys of video. Video is currently quite behind in terms of quality because the hardware and file transport levels aren’t ready for VR yet.
2. Stereoscopic vs monoscopic: Although monoscopic content is supported, there is no particular effort put into it. I think stereo is a fundamental part of the VR experience and is where I personally draw the line. Monoscopic 360° content can be very appealing and a VR headset is certainly the best way to experience it, but the added dimension of depth is what changes the game for me.
3. Spherical vs Reduced FOV: I think hemispherical content will definitely have a place in VR, especially for story telling. For this project however, I’m focusing on the fully immersive experience.

## 3. Oculus/OpenGL/CUDA interop

#### 3.1. Oculus/OpenGL

The interoperability between the Oculus SDK and OpenGL is described in the Oculus SDK documentation at Rendering Setup Outline and in the OculusRoomTinyGL sample.

The basic principle is that we ask the runtime to allocate a set of GL textures for each eye. During rendering we will cycle through the set, drawing into a different texture from one frame to the next. Note that the textures are created by the runtime, it’s not possible to provide our own texture id from textures we would have created elsewhere.

#### 3.2 OpenGL/CUDA

The interoperability between OpenGL textures and CUDA is described in the CUDA programming guide at 3.2.12.1. OpenGL Interoperability. The basic principle is that an OpenGL texture can be mapped into CUDA under the form of a CUDA Array and still be manipulated by both OpenGL and CUDA (not simultaneously). A CUDA Array is basically an abstraction level above either a CUDA Texture (for read only content) or a CUDA Surface (for read/write content).

Older graphics cards only support Texture and Surface references, and many tutorials use them. These need to be defined at compile time and make the code somewhat ugly and awkward. Texture and Surface objects are much more natural constructs to use. The relevant part of the programming guide is 3.2.11. Texture and Surface Memory. Surface objects are supported on adapters having Cuda Compute Capability 3.0 (GTX 600+, Kepler microarchitecture), which is still one entire generation of cards below the recommended specs for the Oculus Rift (GTX 970 – Maxwell microarchitecture), so this limitation is fine for the project.

Capsule is also using a CUDA Texture to store the actual image to be projected. There is no OpenGL interop going on here, it goes straight from the central memory to CUDA and is not accessed outside of CUDA code. For the eye buffer, since we need write-access, we must use a Surface object rather than a Texture object.

The complete path from an OpenGL texture to something we can draw onto from within a CUDA kernel is something like:

• Create a texture in OpenGL.
• Call cudaGraphicsGLRegisterImage to create the CUDA-side resource for the Texture.
• Call cudaGraphicsSubResourceGetMappedArray to map the resource to a CUDA Array.
• Call cudaCreateSurfaceObject to bind the CUDA array to a CUDA Surface object.
• Call surf2Dwrite to draw onto the Surface object and hence onto the OpenGL texture.

The resource must be unmapped so that the texture can be used by the OpenGL side.

A simple trick is to search for these functions and related functions on the whole tree of CUDA code samples.

The final interop plumbing arrangement from Oculus to CUDA:

Fig. 2. Oculus to CUDA interop plumbing.

## 4. CUDA Kernels

#### 4.1. Projection types

There are two CUDA kernels implemented. One for the equirectangular projection and one for the cubemap projection.

For the cubemap projection we need to use a set of conventions for face ordering and orientation. The de facto standard in VR is coming from the format used by Oculus in-house image viewer for the Samsung GearVR and is the following:

• The unfolded cube is stored as a long strip of faces.
• Faces are ordered as +X, -X, +Y, -Y, +Z, -Z.
• Top and Bottom faces are aligned to the Z axis.
• All faces are flipped horizontally.
• For stereo the right eye strip is stored after the left one, creating a 12-face long strip.

The choice of kernel to use is based on the aspect ratio of the image. A stereo equirectangular image has an aspect ratio of 1:1 for the Top-Bottom configuration and of 4:1 in the Left-Right configuration. A stereo cubemap image has an aspect ratio of 12:1. The monoscopic versions are respectively 2:1 and 6:1. If we decide not to support variations within these configurations, like Bottom-Top or other cube faces ordering, the projection can be automatically inferred from the aspect ratio of the image.

#### 4.2. Projections

The Oculus runtime provides the eye camera (off-axis) frustum as a FovPort structure. This is a set of 4 numbers representing the half-FOV in the up, down, left and right directions around the camera axis. Knowing the size of the buffer in pixels, we can compute the camera center and focal distance. Then, using these camera intrinsic parameters we can find the direction of rays starting at the camera projection center and passing through any pixel of the eye buffer. This represents the first two steps of the algorithms and could actually be pre-computed into a kind of normal map for the eye. The full approach is described below.

Equirectangular kernel

For each pixel location in the eye buffer:

1. Back project the pixel from 2D to 3D coordinates by converting it to homogenous coordinates.
2. Normalize the homogenous coordinates to get a direction vector.
3. Rotate the direction vector to account for headset orientation.
4. Convert the direction vector (equivalently, a point on the unit sphere) to spherical coordinates.
5. Convert the spherical coordinates to image coordinates.
6. Fetch the color at the computed image location (with bilinear interpolation and optional fading).
7. Write the final color at the pixel location.

Cubemap kernel

For each pixel location in the eye buffer:

1. Back project the pixel from 2D to 3D coordinates by converting it to homogenous coordinates.
2. Normalize the homogenous coordinates to get a direction vector.
3. Rotate the direction vector to account for headset orientation.
4. Find the largest component of the direction vector to find the face it is pointing to.
5. Project the direction vector onto the selected face.
6. Adapt the signs of the remaining two components for use as 2D coordinates on that specific face.
7. Shift the x coordinate to account for the face order within the entire cubemap.
8. Fetch the color at the computed image location (with bilinear interpolation and optional fading).
9. Write the final color at the pixel location.

Note that the cubemap reprojection only involves simple arithmetic and is slightly faster than the equirectangular one in my implementation.

## 5. Performances

#### 5.1. Perf/Quality threshold in VR

While in a traditional application a low framerate will make the experience less enjoyable, in VR it will make the user physically sick. Consequently, we must pull the performance-quality trade-off cursor to the performance side first. There is a threshold of performance we cannot slide past under. It’s only once the framerate and latency requirements are met that we can start considering image quality.

In addition to issues caused by poor performances, there is a whole bestiary of visual artifacts that are VR-specific and that regular applications don’t have to bother about. Judder, incorrect depth, head tracking latency, object distortion, etc. Each come with its particular cues but the important thing is that issues are hiding each other. You can only really understand what it feels to experience the distortion effect caused by the lack of positional tracking when you no longer have judder caused by poor framerate.

The talk “Elevating Your VR” by Tom Heath at Oculus Connect 1 in 2014 is still what I consider the best talk about VR-specific artifacts, how to become aware of them and how to fix them. It should be required viewing for anyone working in VR (1H05), slides only (PDF).

For Capsule, thanks to the image-based rendering approach, the required 75 fps are easily reached on modern GPUs. Unfortunately, distortion issues due to the lack of positional tracking cannot be fixed within the confines of omni-stereo images (Light fields will later save the day).

I found that the worst remaining offender was headlocking during image transitions. In a first test version I was loading the next image in the same thread as the rendering one. The display stopped refreshing during the few hundreds of milliseconds required to load the image from the disk to memory and then to the GPU. This caused a headlock: no matter where you turn your head, the entire picture is coming with it, it feels like the whole world is spinning around, and it immediately causes motion sickness.

#### 5.3. Profiling

The frame budget for the DK2 and CV1 are 13.3 ms and 11.1 ms respectively. The Oculus runtime and compositor will eat a part of that budget. My goal was to get under 5 ms per projection to fit both eyes inside 10 ms. CUDA has a useful profiler integrated with Visual Studio. It’s very easy to test various approaches in the Kernels and immediately check the impact with actual stats.

Of the two kernels, the equirectangular one is slightly slower than the cubemap one. This is mostly due to the trigonometry involved in going from a 3D location on the unit sphere to spherical coordinates. The cubemap inverse projection can entirely be solved with simple arithmetic.

After a few roundtrips of profiling and optimization, the performance was better than I expected, running largely inside the bugdet for the default sized eye buffers. I pushed the pixel density to 2x to explore peak quality and made it a user-controlled option. Despite the comment in the SDK code, the effect of 2x pixel density is sensible and produce much less aliasing crawlies when looking around. This is particularly welcome on the DK2 because we tend to constantly slightly move the head to minimize the screen door effect.

The performance is not influenced by the size of the input image (as long as it fits in the dedicated GPU RAM). Fig. 3. Shows a profiler run summary where the eye buffer is sized at 2364×2927 px, about 7 million pixels, corresponding to my HMD profile at 2.0 pixel density. The equirect kernel runs in 2.9 ms on a 8000×8000 px source, and the cubemap kernel runs in 1.9 ms on a 24000×2000 px source. This is on an Nvidia GTX 660.

Fig. 3. Profiler run summary for 2364×2927 eye buffer on an Nvidia GTX 660.

There is still room for experimentation. The kernels both start by computing the pixel’s normalized direction, prior to applying the head rotation. This never change and could be stored in a map. It would replace a few arithmetic operations by one memory fetch.

## 6. Future plans

There could be many avenues of improvement for this project. Supporting ambient audio, short animated sequences, ultra high resolution content, zooming, in-VR file explorer, or even implementing light field rendering right into it. However I wanted this project to be self-contained and it will likely continue this way, simply fulfilling its original purpose of quickly experiencing omni-stereo images on the DK2. The more ambitious features will be in more ambitious softwares.

VR

## Implementing a light field renderer

Demo and discussion about an experimental light field renderer I’ve been working on.

The main reference used is “Dynamically Reparameterized Light Fields

Telerobotics

The small T shapes protruding from the Wasp 110 carbon frame are quite handy for many purposes, including balancing the quadcopter center of gravity.

It’s not perfect because the two-point support given by the wires provides a bit of extra stability. It still gives valuable information as to whether the craft is tail or nose heavy.

The featured quadcopter is nothing fancy, it is a transplant of a Hubsan X4 flight controller board, motors and propellers to the Wasp 110 carbon fiber frame from SOSx.

The FPV camera, VTX and antenna is the Spektrum VA1100 bundle. I use a separate 1S LiPo to power the FPV subsystem.

AUW is 53.5 grams and motor to motor length is 110 mm.

The balancing rig is made out of MakerBeam parts.

Video

## Sub-frame accurate synchronization of two USB cameras

Following up on my experiments with rolling shutter calibration.

Here is a method to synchronize two off-the-shelf USB cameras with sub-frame accuracy.

Jumplist

1. Introduction

Off-the-shelf USB camera (Webcams) lack the hardware to synchronize the start of exposition of multiple cameras at once.
Without this hardware support, synchronization has traditionally be limited to frame-level synchronization, where a visual or audio signal is used to temporally align the videos in post-production.
This frame-level synchronization suffers from an error ranging between 0 and half the frame period. That is, for a camera pair running at 30 fps, we may have up to 16.6 ms of misalignment. This is problematic for moving scenes in stereoscopy and other multi-view applications.

A second characteristic of these cameras is the fact that they use a rolling shutter: a given frame is constructed by staggering the exposures of the sensor rows rather than exposing the entire scene at once.

We can combine these characteristics in a simple manner to estimate the synchronization error between a pair of cameras. Manual or automatic disconnection/reconnection of one of the cameras can then be performed until the pair is under an acceptable synchronization error level.

2. Principle

The principle is simple and a straightforward extension to my previous post about mesuring rolling shutter.

A short flash of light will be imaged by both cameras as a narrow band of illuminated rows. In the general case the band will not be located at the same vertical coordinate in both images. The difference in coordinate is the synchronization error.

If we have calibrated the rolling shutter and know the row-delay, we can directly infer the synchronization error by counting the number of rows of difference in the image of the flash between the two cameras.

3. Factors to consider

3.1 Exposure duration
The exposure duration will change the width of the band. Due to the way the rolling shutter is implemented in sensor drivers, the band will increase by the bottom while the top of the band will stay in place.

This is because the exposition of each row is determined by the stagerring of the readout signal (end of exposure), and when we increase the exposition these readout signals are kept attached to their temporal coordinate. It is the reset signals (start of exposure) that are moved earlier in time. Rows that are further down begin their exposition earlier than before and start to capture the flash.

I have not yet found a camera that would drive the row exposition differently.

Fig. 1. Altering the exposure duration without changing anything else makes the flash band grow by the bottom.

3.2 Frame level delay.
For some framerate values, a frame-level delay may be inserted between the end of the readout of the last row and the start of the reset of the first row of the next frame. This may be referred to as the horizontal blanking period in camera manufacturer data sheets.
When this is the case, there is a dark space in the frame timing diagram. The flash may land in this space and won’t be visible in one of the camera.

3.3 Interframe synchronization
This procedure is only concerned with intra-frame synchronization and does not provide any clue with regards to synchronizing the streams at the frame level. In other words, even if the start of the exposures of each images is perfectly aligned in time, the synchronization could still be off by an integer number of frames. The usual frame-level synchronization cues using video or audio signals is still required.

4. Experimental setup

I conducted a simple experiment with two Logitech C920 cameras.

Both cameras were set to 100µs exposure duration, and an arduino-based LED stroboscope was used to generate flashes at a given frequency.

After the synchronization procedure was performed, the exposure and gain were set back to suitable values with regards to ambiant light, without disconnecting the camera streams. The camera were then used to film a scene.

After each scene capture, the synchronization procedure was repeated to account for a small framerate drift introduced by the cameras.

A similar scene was filmed twice, the first time at less than one millisecod of synchronization error, and the second time at around 12 ms sync error.

The videos have been filmed with a parallel rig at 65mm interaxial (Fig. 2.). The camera where set in portrait mode to maximize vertical field of view, which means that the rolling shutter was scanning horizontally. The rolling shutter itself may also cause a small amount of spatial distortion in the scene.

Fig. 2. Dual C920 camera rig used.

One of the camera stream has been systematically shifted up by 0.54% to correct for a small vertical disparity at the rig level. No other geometric nor radiometric synchronization was performed.
5. Results

The following side by side animations presents the streams in crossview order (RL): the left camera image is on the right side and the right camera image is on the left. The effect induced by the missynchronization is subtle but can be experienced by “freeviewing” the stereo pairs. A tutorial on how to view this type of images stereoscopically can be found here. In the missynchronized case, the balls motion induce a rivalry between the left and right eye that can be felt as some sort of blinking.

The animations run at 3 fps.

Fig. 3. Juggling sequence 1 – synchronization error: less than one millisecond.

Fig. 4. Juggling sequence 2 – synchnorization error: around 10 milliseconds.

The following animations compares the synchronization artifacts on the balls in motion by overlaying the left and right images at 50% opacity. The horizontal disparity has been minimized at the plane of the face.

Fig. 5. Comparison of synchronization artifacts on objects in motion. Left: less than one millisecond of sync error, right: more than 10 ms sync error.

6. Future work

The method could be extended to any number of cameras. Manually reconnecting the camera stream is cumbersome though and an automated procedure could be developed to automatically restart the stream until the synchronization error is within a configurable value.

Another extension to this method is the synchronization of multiple cameras for super-framerate imaging. By staggering the expositions by a known amount, several cameras can be used to capture a high framerate video of the scene, provided we can correct geometric and radiometric disparities in the cameras.

Video

## Estimating a camera’s field of view

In many scenarios it is interesting to estimate the field of view that is covered by images produced by a given camera.
The field of view depends on several factors such as the lens, the physical size of the sensor or the selected image format.

The usual techniques involve measuring an object of known width placed at a known distance, and solving simple trigonometry. This however assume that the camera uses a rectilinear lens which is not necessarily the case.

A very simple technique to estimate the field of view without knowing any intrinsic parameters of the camera has been described by William Steptoe in the following article: AR-Rift: Aligning Tracking and Video Spaces.

The idea is to place the camera in such a position that radiating lines are seen as “vertical parallels” on the camera image, and the center line passes through the center of the lens.
You then read the FOV by looking at the last graduation at the edge of the image.

I generated a pattern to make the reading easier (fig. 1). You can download the full page PDF here:

Fig. 1. FOV estimation test pattern.

The pattern is a protractor with 0.5° granularity. The angle values start at 0 on the vertical center line and increase outward on each side. It sports a second set of graduations closer to the camera lens to help with the measurement.

If you use Kinovea to view the camera stream, you can enable the “test grid” tool to make sure the center vertical line is perfectly aligned with the center of the lens.

Here is an animation of the process in action.

Fig. 2. Estimating camera FOV in Kinovea using a specially crafted protractor pattern and grid overlay.

The camera should be as low as possible relatively to the pattern, but obviously not so low that you can’t see the graduations. The camera FOV is read by taking the last visible graduation on the side and multiplying it by two.
You should expect ±1° of error from the measurements.

Video

## Review of the ELP-USBFHD01M camera

I have not found a review of this camera anywhere so I’m going to attempt to fill that void.

This USB 2.0 camera module is sold by Ailipu Technology Inc. (Shenzhen) under the model name ELP-USBFHD01M. There are several variations on the product depending on the stock lens you get it with.

I got the camera for around 35€. Shipping was around 30€. An additional 20% tax was asked at delivery (depends on buyer’s country).

Summary

Pros

• 1280×720 @ 60 fps (MJPEG).
• Interchangeable M12 lens.
• Manual exposure.
• UVC compliant.

Cons

• Image sharpness.
• Poor dynamic range.
• No housing.

Fig.1 ELP-USBFHD01M camera module with the 140° lens.

Jump list

Installation

The camera is UVC compliant so there is no special installation procedure. It is instantly recognized by Windows (tested on Windows 7, 8.1 and Windows 10 Tech preview). There is no vendor-specific driver to install on top of the Microsoft-provided UVC driver.

The USB descriptors report the following: VID = 05A3,  PID = 9230. However, a different camera module from Ailipu Tech (ELP8MP02G) reports a different vendor id so I’m not sure it is a legit id from USB Implementers Forum.
The camera name is reported as simply “HD USB Camera” and the manufacturer as “HD Camera Manufacturer”.

Hackability & Form factor

The module is 37.5mm × 37.5mm. It does not have any casing.
The back of the board has a 4-pin connector where the USB cable is plugged. The USB cable can be detached from the board and the connector wiring is printed on the board, which is pretty cool in case of a repair or modification. The shipped cable is 1 meter long as advertised.

Fig. 2. USB connector and wiring.

The board sports a standard M12 lens holder with 20mm hole distance. The exact lens holder model will depend on the lens.
The lens holder can easily be unscrewed and swapped with another one. The lens is tightened with a screw.

Having an interchangeable lens is really neat.

Fig. 3. Lens mount and focus screw.

Camera controls

The camera has many of the standard controls, including exposure, gain and sharpness.
Here are the DirectShow property pages for “Video Proc Amp” and “Camera Control” on the filter. Greyed out options are not available.

Note on white balance: it is not possible to reproduce the “Auto” mode with the manual slider. I have found that the best color is achieved when using the “Auto” mode, especially for whites. The manual mode gives a color cast to everything.

As it is usually the case, the exposure value will take precedence over the framerate when using long exposures. When this happens the framerate is limited to exactly 1/exposure duration.

Exposure values are not documented. Almost no camera manufacturer follow neither the UVC spec nor the DirectShow spec, and this camera is no exception.

Here is the mapping I found by inferring from 1/framerate on long exposures and comparing with a Logitech C920 for lower values:

 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 640ms 320ms 160ms 80ms 40ms 20ms 10ms 5ms 2.5ms 1.250ms 650µs 312µs 150µs

Focus is done manually by screwing/unscrewing the lens inside its mount.

Stream formats

The camera can stream in MJPEG or YUY2 formats.

Each image size has a single framerate associated with it. In my experience the settings do not exactly match the actual frame frequency. I have noted in italics and in parenthesis the values I measured.

The following combinations are available:

• MJPEG : 320×240 @ 120.1 fps (100 fps), 640×480 @ 120.1 fps (100 fps), 800×600 @ 60 fps (60 fps), 1280×720 @ 60 fps (60 fps), 1280×1024 @ 31 fps (30 fps), 1920×1080 @ 31 fps (30 fps).
• YUY2 : 320×240 @ 31 fps, 640×480 @ 31 fps, 800×600 @ 21 fps, 1280×720 @ 9 fps, 1280×1024 @ 6 fps, 1920×1080 @ 6 fps.

Image quality

Image quality is not a strong point of this camera. Since a poor image quality may come from many factor as the lens, the sensor, the image processing chip or the JPEG encoder, some of which we have partial control on, I will attempt to find out the origin of each issue.

Lens : I tested with the 140° and 180° stock lenses, as well as with a Sunex 955A (rated 5 megapixels). I was not able to quite reproduce the image quality of the Logitech C920 or the Microsoft Lifecam Studio. Image quality is not horrible though, and on bright daylight it may be enough depending on the purpose.

Flares: Since the camera is not in any housing, the lens is protruding and may be more easily polluted by light rays coming from aside. Some sort of hood will be a good addition when strong light is coming at an angle.

JPEG encoding: The compression level is not known. Some JPEG artifacts may be visible in low light conditions.

Dynamic range: The dynamic range is poor and if the scene contains dark and bright areas simultaneously details will be lost.

Rolling shutter: I measured the rolling shutter frame scan time to be 17ms on the 1280×720 @ 60 fps and 36 ms on 1920×1080 @ 30 fps. Scan time at other resolutions are similarly mostly dependent on the framerate.

Sensor specs

The vendor reports that the sensor is an Omnivision OV2710. The spec sheet for it can be found here. Here are some highlights:

• Lens format: 1/2.7″.
• Pixel size: 3µm×3µm.
• Image area: 5856µm × 3276µm.

The “720p cropped” image format

The 1280×720 @ 60fps stream uses a crop of the full frame, not a down sampling of the full sensor as in other lower sizes. This is a characteristic of the OV2710  itself. The final field of view of the image will be reduced. Apply a factor of 2/3 to find the new field of view spanned by the cropped image. If the 1920×1080 image diagonally spans 140° for example, in the 720p crop it will be reduced to 93°.

The image circle created by a lens made for 1/3″ sensors will cover the full frame (1920×1080) image. When using 1280×720 @ 60fps the image will be better suited to a lens made for 1/4″ sensors.

The 180° stock lens provided with the camera seems to be a 1/4″ format. It is well suited for the 720p cropped stream, but displays hard vignetting on full frame.

Samples

These samples were captured with Kinovea and are direct dumps of the MJPEG frames.

Conclusion

In the end the USBFHD01M is a capable camera module. To my knowledge no other USB 2.0 camera provides 1280×720 @ 60 fps to this date.

It can be an interesting tool if you can live with or work around its shortcomings.

Video

## Measuring rolling shutter with a strobing LED

In a rolling shutter camera (most USB cameras), the image rows are exposed sequentially, from top to bottom. It follows that the various parts of the resulting video image do not exactly correspond to the same point in time. This is definitely a problem when using these videos for measuring object velocities or for camera synchronization. Basically the only bright side is that it makes propellers and rotating fans look funny.

Fig. 1. Rolling shutter artifacts when imaging a propeller.

But beside the visual distortion, can we quantify how severe the problem is? Let’s find out with a simple blinking LED.

Principle

The exposure duration is independent from the rolling shutter. Each row is properly exposed as configured, it’s just that the start of the exposure of a given row is slightly shifted in time relatively to the previous one. So if we flash a light for a very short time in an otherwise dark environment, only the rows that happen to be exposed at that time are going to capture it.

Fig. 2. Rolling shutter model with a flash of light happening during frame integration.

An Arduino can serve as a cheap stroboscope for our purpose. The principle is simple, we align the LED blinking frequency on the camera framerate so that exactly one flash happens for each captured frame. We limit the flash duration to reduce the stripe of rows catching it, and we measure the height of the stripe in pixels. Considering we know how long each row in the stripe has been collecting light and how long the flash lasted, we can compute the total time represented by the rows and derive a per row delay.

Setup

As we are going to count pixels on the resulting images we want the clearest possible differentiation between lit rows and unlit rows and we want the entire row to be lit. For this I’m using a large LED (10 mm) and a diffusion block. The diffuser is simply a piece of Styrofoam 2cm thick, with the LED head stuffed in about 5mm deep (works better than a ping pong ball).
All timing relevant code in the Arduino is done using microseconds (more about accuracy below).

Fig. 3. Experimental setup.

On the camera side, I manually focused to the closest possible, and set the exposure time to the minimum possible, which is 300µs on the Logitech C920. I used Kinovea to configure the camera.

Note that not all camera have a control for exposure duration. Furthermore the usual DirectShow property does not always map precisely to the firmware values. Logitech drivers expose a vendor-specific property with the adequate control of exposure.
The LED with the diffuser is crammed close to the camera lens to get as much light as possible from the flash.

First attempt

After configuring the camera to 800×600 @ 30fps, and the Arduino to blink the LED for 500µs every 33333µs, we get a nice Cylon stripe:

Fig. 4. Stripe of illuminated rows, with strobing frequency mismatch.

Interesting. A stripe corresponding to the few rows that were exposed during the LED flash… But slowly moving up. Either the camera is not really capturing frames at 30 fps like it advertises, or the Arduino clock is not ticking exactly 30 times per seconds, or a bit of both.

Arduino’s micros() has a granularity of 4µs but it is definitely lacking in accuracy. It suffers from drift and cannot be used if long term accuracy is required. For our purposes, the difference between, say, 30.00 fps and 29.97 fps is about 33µs which is well under the board capabilities. A Real Time Clock would be needed.

So our blinking interval is slightly off, but fortunately it does not really matter for our experiment as we are not trying to measure the camera frequency itself. As long as we can somehow tune to it and stabilize the stripe, we do not need to know its exact value.

The stripe is moving upward, meaning that for each frame, we fall a bit short, and the flash is happening at a lower time coordinate in the frame interval, illuminating higher rows.

At that point I added a potentiometer (plus code to average its noisy values) and linked it to the blinking rate, so that I could more easily fine tune to the actual camera framerate. Manually bisecting the values until the stripe is stable is also quite feasible in practice.

The stripe stabilized when I settled on 33376µs. It is a meaningless value in itself and we are not going to convert that back to a framerate.

Measurements

We should now have everything needed to compute the rolling shutter time shift.
Here are some captured frames with fixed camera exposures and varying LED flash durations. Each time we restart the camera the stripe will be at a different location, but it stays put.

Fig. 5. Capturing the LED flash. Fixed exposure of 300µs, varying flash duration 500µs, 1000µs, 2000µs.

A few measurements are summarized in the following table:

Table 1. Rolling shutter row delay measurements at various exposure and flash duration.

Row delay is given by $\frac{\text{exposure } + \text{ flash duration}}{\text{row count}}$. We get an average time shift of about 56.6µs per row. Considering the 800×600 image, that gives a full image swipe of 33981µs. In other words, a 34ms lag between the top and bottom rows.

Other works

Other methods have been designed to compute the rolling shutter without knowing the camera internals. Most notably a rough but simple method is to film a vertical pole while panning the camera left and right. This allows to compare the inter-frame and intra-frame displacement of the pole and retrieve the full image swipe time. If between top and bottom rows in a single frame the pole is displaced by half as many pixels than what it is displaced between the top rows in two adjacent frames, we know the vertical swipe time is half the frame interval. See more details here, several measurements are required to average out the imprecisions.

The 34ms value is probably consistent with the type of sensor quality we would expect to find in a webcam compared to a device dedicated to photography. For DSLR, values of 15 to 25ms are reported using the pole technique, up to more than 30ms as well for some devices.

Image size

The frame scan time may or may not change depending on the chosen resolution. It depends on the way image sizes lower than the sensor full frame are implemented in the camera.

Methods to create lower sized images include downscaling (capture the whole image and interpolate at the digital stage), pixel binning (combine the electrical response of two or four pixels into a single one), cropping (use only a region of the full frame corresponding to the wanted resolution). Pure downscaling will give the same frame scan time between image sizes. Cropping will change the scan time as there really are less pixels to read out.

A particular resolution may use a combination of methods. For example, a 800×600 image on a 1920×1080 sensor may first use cropping to retrieve a 4:3 window of 1440×1080, and then downscale this image by 1.8x to get 800×600.
To know if a resolution is downscaled or cropped you can check the horizontal and vertical field of view that this image displays and compare it to the field of view in the full frame image.

Conclusion

34ms of disparity within an image is definitely a lot when filming fast motion like a tennis serve or golf ball trajectory. A mitigation strategy is to orient the camera in such a way that the motion is parallel to sensor rows. For stereoscopy or multiple-camera vision applications, a sub-frame accurate synchronization method might be required (usually through a dedicated hardware cable). A global shutter device is of course always a better option.