Over the weekend I bought a Kinect and wired it to my Mac. I’m following the O’Reilly/Make book, Making Things See, by Greg Borenstein (who kindly tells me I’m an amateur machine vision researcher). With the book I’ve set up Processing, a JAVA-lite coding environment, and the open source Kinect libraries, SimpleOpenNI & NITE. I’ve spent a good chunk of the weekend reading a hundred pages into the book and working through the project tutorials and I’ve generated some interesting interactions and imagery. There’s also a ton of tutorial vids on You Tube, natch, to help cut through the weeds and whatnot.
What makes the Kinect so special is the pairing of its RGB video camera with an IR projector and a dedicated IR camera. So for every pixel in the scene you get positional (x,y) and RGB values, as well as distance (z). A regular camera captures a flat image. Scripts can crawl the x-y coordinates of the image and make RGB comparisons between adjacent pixels. This is how they can kinda pick out edges & faces and make estimates about depth. But it’s not easy with a flat 2d image. The IR projector of the Kinect sends out a visible map of dots covering every surface within about 25 feet of its camera. The IR camera can then read this and extract the depth map, which is a grayscale video histogram showing nearer pixels as lighter in shade, further ones getting darker. The Kinect can read this depth map with millimeter precision.
With this rich x,y,z dataset we can render the video stream as a point cloud which is exactly what it sounds like: a cloud of points showing where the IR projector is hitting surfaces (technically, the point cloud is a visualization of vectors in the imageDepth array). The great thing about point clouds is they communicate in terms that standard 3D environments understand. They are 3d models so we can import a point cloud into a modeling environment like Maya or 3D Studio Max and then fly around it, apply textures, lighting, deformations, etc… Incidentally, this is the basis for using the Kinect as a 3D scanner, converting point clouds into a usable mesh.
My goal is to set up an RGB-D Toolkit rig to composite Kinect & DSLR video. Check out the work of James George, along these lines, especially the video for Clouds below. To my eye, the Clouds video has such a fascinating aesthetic quality that seems to speak to some deeper sense of our collision and dance with the digital domain, intersecting and merging as it seems to be doing with the analog world.
I also want to figure out SynthEyes and Blender and try to bring all this stuff together into some fancy composite interactive media. We’ll see if I make it that far but I’m committed to at least finishing the tutorials and trying to get that RGB-D Toolkit up and running. More info as I progress…