I thought it would be of interest to discuss how I obtained the player images shown in the previous post.
Stereo cameras have been around for some time. Kinect automates the process of extracting depth values from digital image frames, and while Kinect only provides informaiton at a resolution 640x480 pixels, it does so a very low cost, with relatively low computational resources, and at 30 frames per second. Figure 1 below shows a single Kinect "frame" which has been rotated and rendered as a point-cloud. The frame was captured facing the player, hence the shadows and degree of distortion in the rotated image.
Amongst other things, the Kinect API also has the ability to identify people, and provide real-time informaiton on joint positions in 3D space. This is shown below in Figure 2, where the skeletal information has been included on the same Kinect frame as in Figure 1.
When skeletal tracking is enabled, "player" information is included as part of the depth feed, allowing automatic seperation of pixels belonging to tracked indpividuals. This is shown below in Figure 3, where the same frame is rendered with non-player pixels removed.