Kinect Animated GIFs

By
Dave
Project
Published
25 Nov 2013 02:44
Last Modified
25 Nov 2013 06:12

Animated GIFs seem to becoming popular again, and I thought it would be fun to create some using depth data captured from Kinect.

Of course, this could also be done using an ordinary (non-depth) camera. However, using Kinect I can capture data from a single viewpoint (i.e. without moving the camera or the subject), then do "Bullet Time"-style animation by moving a virtual-camera around the recorded data. Occluded pixels in the original data will be missing, however the flexibility of a virtual camera may outweigh this depending on the desired effect. The figures below show simple camera animations.

Kinect Princess Leia

Figure 1. Princess Leia.1

Kinect Wave

Figure 2. Wave.

1Many thanks to Mojo Jones for both creating the costume and playing Princess Leia.

Kinect Fusion

By
Dave
Project
Published
1 Apr 2013 15:20
Last Modified
9 Apr 2013 15:43

Kinect Fusion was released along with version 1.7 of the Kinect for Windows SDK, and allows reconstruction a 3D surface based on Kinect data from multiple angles. The SDK samples, available in both WPF and Direct2D, support saving the scan as either an .STL or .OBJ file.

The scan itself does not currently include color information, however it is possible to add it by post-processing with additional tools. Figures 1-4 below show editing a Kinect Fusion scan using MeshLab, "an open source, portable, and extensible system for the processing and editing of unstructured 3D triangular meshes", which can be used to re-project multiple 2D color images onto the model.

Fusion Scan Fusion Scan Fusion Scan Fusion Scan

Figures 1-4. Kinect Fusion scan, showing raw output, normal mapping, ambient occlusion, and color re-projection.

Further scans will be posted in the gallery.

Kinect Data

By
Dave
Project
Published
2 Jan 2013 14:46
Last Modified
2 Jan 2013 14:48

Kinect generates a lot of data. For example, 1 second of video at a resolution of 640x480 pixels for both depth and color and at 30 frames per second generates approximately 70Mb of data (640 x 480 * 4 bytes per pixel * 30 frames per second * 2 = 73,728,000), together with a comparatively small amount of audio and skeletal tracking data.

One option is to compress the data using standard image compression. Using as lossy JPEG for color and lossless PNG for depth reduces the overall size of a typical recording by approximately 95%. A single frame is shown below in Figure 1.

Color frame Depth frame Normal frame

Figure 1. Single Kinect frame showing color, depth and normal map.

I also wanted to package the recording files (i.e. color, depth, audio, skeletal tracking etc) into a single container. The .NET System.IO.Packaging namespace provides a convenient wrapper for packaging files according to the Open Packaging Conventions (OPC). Providing the packages are named with a .zip extension, they can be opened using Windows Explorer, and additional files can be added, providing they correspond to MIME types defined in the [Content_Types].xml file in the package.

Another advantage of using images to encode color and depth data is the ability to browse the data using thumbnail icons in Windows Explorer.

Surface Mapping

By
Dave
Project
Published
9 Dec 2012 00:49
Last Modified
13 Jan 2013 17:11

Now that I can generate a normal map from depth data, I can avoid mapping color pixels to back-facing surfaces (technically, surfaces which are facing away from the sensor), as shown below in Video 1.

Video 1. Back-face removal of color pixels in Kinect data.

This is important to avoid visual confusion when rotating the model, since it is difficult to distinguish between the inside and outside of a textured surface.

Normal Mapping

By
Dave
Project
Published
25 Nov 2012 18:40
Last Modified
13 Jan 2013 17:12

There are some excellent solutions to surface-reconstuction using Kinect, such as Kinect Fusion, however I was still keen to understand the feasibility of extracting a basic normal map from depth data.

In order to determine the normal vector for a given depth pixel, I simply sample surrounding pixels and look at the local surface gradient. However, as depth values are stepped, for small sample areas and particularly at larger depth values, this will result in a lot of forward-facing normals from the surfaces of the depth "planes", as shown below in Figure 1. Using a larger sample size improves things significantly, as shown in the second image.

Normal Map Normal Map

Figure 1. Normal maps from raw depth data, using smaller and larger sample areas.

The normal map then enables the point cloud to be rendered using directional lighting, as shown below in Figure 3.

Normal Map Normal Map

Figure 3. Diffuse and specular lighting applied to point cloud.

Note that the images above are still rendered as point clouds, rather than a surface mesh.

Smoothing Depth Data

By
Dave
Project
Published
25 Nov 2012 17:38
Last Modified
13 Jan 2013 17:14

The Kinect for Windows SDK exposes depth data as an array of 16-bit values, with the least-significant 3-bits used for player index.1 There are therefore 2^13 = 8192 values available to report depth within the supported range. A sample depth image is shown below in Figure 1. Note that the shape is a result of the sensor being angled downwards, and that black areas correspond to those pixels where no depth information was reported by the sensor.

Kinect Room

Figure 1. Kinect depth image.

If this image is rotated and viewed from above, as shown below in Figure 2, discrete depth bands become visible.

Kinect Room

Figure 2. Kinect depth image, rotated to highlight depth-banding.

The intervals between depth values for another sample image are plotted against depth in Figure 3 below. Note that the sample image used did not contain any data around 1.5m in depth, so there are some jumps in the data at this point. The graph shows how the depth intervals increase in size with distance from the sensor, from around a 2mm gap at 1m depth to around a 45mm gap at 4m depth.

Depth Step by Depth

Figure 3. Depth step by depth.

My initial attempt at a smoothing algorithm is shown below in Figure 4. This approach looks for horizontal and vertical lines of equal depth, and interpolates data between the discrete depth bands. Since these depth bands increase in size further away from the camera, smoothing is more effective for larger depth values.

Kinect Room Kinect Room

Figure 4. Raw and smoothed depth image.

1 As of version 1.6, the Kinect for Windows SDK exposes extended depth information.

Player Extraction

By
Dave
Project
Published
2 Nov 2012 23:16
Last Modified
13 Jan 2013 17:14

I thought it would be of interest to discuss how I obtained the player images shown in the previous post.

Stereo cameras have been around for some time. Kinect automates the process of extracting depth values from digital image frames, and while Kinect only provides informaiton at a resolution 640x480 pixels, it does so a very low cost, with relatively low computational resources, and at 30 frames per second. Figure 1 below shows a single Kinect "frame" which has been rotated and rendered as a point-cloud. The frame was captured facing the player, hence the shadows and degree of distortion in the rotated image.

Kinect Room

Figure 1. Single Kinect frame, rotated to highlight depth.

Amongst other things, the Kinect API also has the ability to identify people, and provide real-time informaiton on joint positions in 3D space. This is shown below in Figure 2, where the skeletal information has been included on the same Kinect frame as in Figure 1.

Kinect Room

Figure 2. Single Kinect frame with skeleton overlay, rotated to highlight depth.

When skeletal tracking is enabled, "player" information is included as part of the depth feed, allowing automatic seperation of pixels belonging to tracked indpividuals. This is shown below in Figure 3, where the same frame is rendered with non-player pixels removed.

Kinect Room

Figure 3. Single Kinect frame showing player only, rotated to highlight depth.

Depth Image Rendering

By
Dave
Project
Published
24 Oct 2012 19:01
Last Modified
13 Jan 2013 17:15

There are numerous ways to render depth data captured from Kinect. One option is to use a point-cloud, where each depth value is represented by a pixel positioned in 3D space. In the absence of a 3D display, one of the ways to convey depth for still images is the use of stereograms, as shown below in Figure 1.

Point-cloud stereogram

Figure 1. Point-cloud stereogram.1.

In case you are wondering, I'm holding a wireless keyboard to control the image capture. Next I needed to map the texture from the color camera onto the point-cloud, as shown below in Figure 2.

Point-cloud color stereogram

Figure 2. Point-cloud stereogram1 with color mapping.

Another approach to simulating 3D without special display hardware (but which does require special glasses2), which avoids the degree of training involved to "see" images such as stereograms, is the use of anaglyphs, as shown below in Figure 3.

Point-cloud color anaglyph

Figure 3. Point-cloud color anaglyph.2.

Anaglyphs can be adjusted to move the image plane "forwards" or "backwards" in relation to the screen, as shown by the grayscale anaglyphs in Figures 4-6 below.

Point-cloud grayscale anaglyph Point-cloud grayscale anaglyph Point-cloud grayscale anaglyph

Figures 4-6. Point-cloud grayscale anaglyphs2 "behind", "co-planar" with, and "in-front" of screen plane.

1In order to perceive a 3D image the viewer must decouple convergence and focusing of their eyes. Looking "through" the image results in four images. The eyes are correctly converged when the two centre images "overlap". At this point the eyes must be refocussed without changing their convergence.

2In order to perceive a 3D image the viewer must use coloured filters for each eye, in this case red (left) and cyan (right).

Depth Image Capture

By
Dave
Project
Published
30 Sep 2012 23:00
Last Modified
13 Jan 2013 17:16

I previously discussed a approach for visualising 3D on a Microsoft Surface device using autostereograms. This had the advantage of supporting more than a single user, since simultaneous depth-perception is posible from opposite sides of the device. However, it suffered from disadvantages that there was a degree of training involved to "see" the image (in particular when the image is animated and using a random dot pattern), and that these type of autostereograms are unable to convey color.

I thought I'd start a new project to explore the use of Microsoft Kinect to work with 3D.

Kinect is a great example of the powerful combination of both hardware (e.g. the depth camera) and software (skeletal tracking). Intriguingly, one way to think about how the depth sensor in Kinect actually works is to compare it to an autostereogram. These images allow depth perception since the human brain has a remarkable ability to infer depth from a random dot pattern when shifted in a particular way. The depth sensor in Kinect also uses shifts in position of a random dot pattern (due to parallax between the emitter and receiver) to infer depth values.

Capturing depth images using Kinect is straightforward, as demonstrated extensively in the Software Development Kit.