Kinect Data

By
Dave
Project
Published
2 Jan 2013 14:46
Last Modified
2 Jan 2013 14:48

Kinect generates a lot of data. For example, 1 second of video at a resolution of 640x480 pixels for both depth and color and at 30 frames per second generates approximately 70Mb of data (640 x 480 * 4 bytes per pixel * 30 frames per second * 2 = 73,728,000), together with a comparatively small amount of audio and skeletal tracking data.

One option is to compress the data using standard image compression. Using as lossy JPEG for color and lossless PNG for depth reduces the overall size of a typical recording by approximately 95%. A single frame is shown below in Figure 1.

Color frame Depth frame Normal frame

Figure 1. Single Kinect frame showing color, depth and normal map.

I also wanted to package the recording files (i.e. color, depth, audio, skeletal tracking etc) into a single container. The .NET System.IO.Packaging namespace provides a convenient wrapper for packaging files according to the Open Packaging Conventions (OPC). Providing the packages are named with a .zip extension, they can be opened using Windows Explorer, and additional files can be added, providing they correspond to MIME types defined in the [Content_Types].xml file in the package.

Another advantage of using images to encode color and depth data is the ability to browse the data using thumbnail icons in Windows Explorer.

Surface Mapping

By
Dave
Project
Published
9 Dec 2012 00:49
Last Modified
13 Jan 2013 17:11

Now that I can generate a normal map from depth data, I can avoid mapping color pixels to back-facing surfaces (technically, surfaces which are facing away from the sensor), as shown below in Video 1.

Video 1. Back-face removal of color pixels in Kinect data.

This is important to avoid visual confusion when rotating the model, since it is difficult to distinguish between the inside and outside of a textured surface.

Normal Mapping

By
Dave
Project
Published
25 Nov 2012 18:40
Last Modified
13 Jan 2013 17:12

There are some excellent solutions to surface-reconstuction using Kinect, such as Kinect Fusion, however I was still keen to understand the feasibility of extracting a basic normal map from depth data.

In order to determine the normal vector for a given depth pixel, I simply sample surrounding pixels and look at the local surface gradient. However, as depth values are stepped, for small sample areas and particularly at larger depth values, this will result in a lot of forward-facing normals from the surfaces of the depth "planes", as shown below in Figure 1. Using a larger sample size improves things significantly, as shown in the second image.

Normal Map Normal Map

Figure 1. Normal maps from raw depth data, using smaller and larger sample areas.

The normal map then enables the point cloud to be rendered using directional lighting, as shown below in Figure 3.

Normal Map Normal Map

Figure 3. Diffuse and specular lighting applied to point cloud.

Note that the images above are still rendered as point clouds, rather than a surface mesh.

Smoothing Depth Data

By
Dave
Project
Published
25 Nov 2012 17:38
Last Modified
13 Jan 2013 17:14

The Kinect for Windows SDK exposes depth data as an array of 16-bit values, with the least-significant 3-bits used for player index.1 There are therefore 2^13 = 8192 values available to report depth within the supported range. A sample depth image is shown below in Figure 1. Note that the shape is a result of the sensor being angled downwards, and that black areas correspond to those pixels where no depth information was reported by the sensor.

Kinect Room

Figure 1. Kinect depth image.

If this image is rotated and viewed from above, as shown below in Figure 2, discrete depth bands become visible.

Kinect Room

Figure 2. Kinect depth image, rotated to highlight depth-banding.

The intervals between depth values for another sample image are plotted against depth in Figure 3 below. Note that the sample image used did not contain any data around 1.5m in depth, so there are some jumps in the data at this point. The graph shows how the depth intervals increase in size with distance from the sensor, from around a 2mm gap at 1m depth to around a 45mm gap at 4m depth.

Depth Step by Depth

Figure 3. Depth step by depth.

My initial attempt at a smoothing algorithm is shown below in Figure 4. This approach looks for horizontal and vertical lines of equal depth, and interpolates data between the discrete depth bands. Since these depth bands increase in size further away from the camera, smoothing is more effective for larger depth values.

Kinect Room Kinect Room

Figure 4. Raw and smoothed depth image.

1 As of version 1.6, the Kinect for Windows SDK exposes extended depth information.

Player Extraction

By
Dave
Project
Published
2 Nov 2012 23:16
Last Modified
13 Jan 2013 17:14

I thought it would be of interest to discuss how I obtained the player images shown in the previous post.

Stereo cameras have been around for some time. Kinect automates the process of extracting depth values from digital image frames, and while Kinect only provides informaiton at a resolution 640x480 pixels, it does so a very low cost, with relatively low computational resources, and at 30 frames per second. Figure 1 below shows a single Kinect "frame" which has been rotated and rendered as a point-cloud. The frame was captured facing the player, hence the shadows and degree of distortion in the rotated image.

Kinect Room

Figure 1. Single Kinect frame, rotated to highlight depth.

Amongst other things, the Kinect API also has the ability to identify people, and provide real-time informaiton on joint positions in 3D space. This is shown below in Figure 2, where the skeletal information has been included on the same Kinect frame as in Figure 1.

Kinect Room

Figure 2. Single Kinect frame with skeleton overlay, rotated to highlight depth.

When skeletal tracking is enabled, "player" information is included as part of the depth feed, allowing automatic seperation of pixels belonging to tracked indpividuals. This is shown below in Figure 3, where the same frame is rendered with non-player pixels removed.

Kinect Room

Figure 3. Single Kinect frame showing player only, rotated to highlight depth.

Depth Image Rendering

By
Dave
Project
Published
24 Oct 2012 19:01
Last Modified
13 Jan 2013 17:15

There are numerous ways to render depth data captured from Kinect. One option is to use a point-cloud, where each depth value is represented by a pixel positioned in 3D space. In the absence of a 3D display, one of the ways to convey depth for still images is the use of stereograms, as shown below in Figure 1.

Point-cloud stereogram

Figure 1. Point-cloud stereogram.1.

In case you are wondering, I'm holding a wireless keyboard to control the image capture. Next I needed to map the texture from the color camera onto the point-cloud, as shown below in Figure 2.

Point-cloud color stereogram

Figure 2. Point-cloud stereogram1 with color mapping.

Another approach to simulating 3D without special display hardware (but which does require special glasses2), which avoids the degree of training involved to "see" images such as stereograms, is the use of anaglyphs, as shown below in Figure 3.

Point-cloud color anaglyph

Figure 3. Point-cloud color anaglyph.2.

Anaglyphs can be adjusted to move the image plane "forwards" or "backwards" in relation to the screen, as shown by the grayscale anaglyphs in Figures 4-6 below.

Point-cloud grayscale anaglyph Point-cloud grayscale anaglyph Point-cloud grayscale anaglyph

Figures 4-6. Point-cloud grayscale anaglyphs2 "behind", "co-planar" with, and "in-front" of screen plane.

1In order to perceive a 3D image the viewer must decouple convergence and focusing of their eyes. Looking "through" the image results in four images. The eyes are correctly converged when the two centre images "overlap". At this point the eyes must be refocussed without changing their convergence.

2In order to perceive a 3D image the viewer must use coloured filters for each eye, in this case red (left) and cyan (right).

Depth Image Capture

By
Dave
Project
Published
30 Sep 2012 23:00
Last Modified
13 Jan 2013 17:16

I previously discussed a approach for visualising 3D on a Microsoft Surface device using autostereograms. This had the advantage of supporting more than a single user, since simultaneous depth-perception is posible from opposite sides of the device. However, it suffered from disadvantages that there was a degree of training involved to "see" the image (in particular when the image is animated and using a random dot pattern), and that these type of autostereograms are unable to convey color.

I thought I'd start a new project to explore the use of Microsoft Kinect to work with 3D.

Kinect is a great example of the powerful combination of both hardware (e.g. the depth camera) and software (skeletal tracking). Intriguingly, one way to think about how the depth sensor in Kinect actually works is to compare it to an autostereogram. These images allow depth perception since the human brain has a remarkable ability to infer depth from a random dot pattern when shifted in a particular way. The depth sensor in Kinect also uses shifts in position of a random dot pattern (due to parallax between the emitter and receiver) to infer depth values.

Capturing depth images using Kinect is straightforward, as demonstrated extensively in the Software Development Kit.

NUIverse Download

By
Dave
Project
Published
4 Sep 2012 20:41
Last Modified
13 Jan 2013 18:05

A beta build of NUIverse is now available for download at http://www.nuiverse.com, along with some brief documentation and additional data downloads.

Note that NUIverse is only available for installation on the Samsung SUR40 with Microsoft PixelSense, and that it is still one of my spare-time projects. As such, many features remain un-implemented and bugs remain to be fixed. However, I welcome feedback and will do my best to respond to any questions as soon as possible.

Surface 2 Physics Download

By
Dave
Project
Published
27 Jul 2012 14:30
Last Modified
13 Jan 2013 18:04

I've finally migrated the original Surface Physics v1 library and sample to .NET 4 and the Samsung SUR40 with Microsoft PixelSense.

For many apps, migrating from the Surface v1 to the SUR40 is very easy, and simply involves a search & replace of controls in the Surface v1 namaspace with their new versions. In my case, because I had to do some lower-level contact-handling, things were a little more complicated.

The sample is broadly similar to the previous version, except that I have removed the "interactions" page, which relied (amongst other things) on the API accurately reporting blob orientation. Blob orientations are now only reported as either 0 or 90°, and I didn't have time to implement the raw-image processing required to replicate the behaviour originally demonstrated on this page.

The following downloads are available:

  1. Surface Physics Sample (install), .msi (zip'd), 860Kb. The sample application for demonstrating the physics library and layout control.
  2. Surface Physics Sample (source code), Visual Studio 2010 Project (zip'd), 730Kb. Source code for the the sample application.
  3. Physics Library (binary), .dll (zip'd), 17Kb. The physics library and layout control.

The Readme for the v1 sample application may prove also prove useful.

You'll need the Microsft Surface 2 SDK, available from the MSDN site here, and access to a SUR40 or at least the Input Simulator in the SDK.

See the project archive for older posts, and the gallery for screenshots.

NUIverse Video Part 2

By
Dave
Project
Published
19 Jul 2012 09:59
Last Modified
13 Jan 2013 17:17

I had the opportunity to demo NUIverse at the Microsoft Worldwide Partner Conference last week, and I thought I'd share the video which shows some updates since the previous recording.

Video 1. NUIverse on Samsung SUR40 with Microsoft PixelSense.

Key things demonstrated in the video include:

  • Multi-touch to control complex camera motion
  • Multi-direction UI consistent with a horizontal display form-factor and multiple concurrent users
  • Level-of-Detail rendering for planetary bodies and backgrounds
  • Independant control of time and position
  • Control selection using just-in-time-chrome
  • Satellite model rendering
Page