Cross-Modal Stereo by Using Kinect
Wei-Chen Chiu, Ulf Blanke, and Mario Fritz
Kinect's active sensing strategy is very well suited to produce robust and high-frame rate depth maps for human pose estimation. But the shift to the robotics domain surfaced applications under a wider set of operation condition it wasn’t originally designed for. We see the sensor fail completely on transparent and specular surfaces which are very common to every day household objects.
We complement the depth estimate within the Kinect by a cross-modal stereo path that we obtain from disparity matching between the included IR and RGB sensor of the Kinect. We investigate how the RGB channels can be combined optimally in order to mimic the image response of the IR sensor. Our combination method produces depth maps that include sufficient evidence for reflective and transparent objects, and preserves at the same time textureless objects, such as tables or a walls. [See our BMVC'11 paper for details.]
However, the method is troubled by interference from the IR projector that is required for the active depth sensing method. We investigate these issues and conduct a more detailed study of the physical characteristics of the sensors. Adapting RGB in frequency domain to mimic an IR image did not yield improved performance. We further propose a more general method that learns optimal filters for cross-modal stereo under projected patterns. From the experimental results we conclude that our pre-filtered, cross-modal, SAD-based stereo vision algorithm profits most from combination in the spatial domain, rather than in the frequency domain.
Source Code, Dataset
The source codes: BMVC'11 written in C/Matlab and CDC4CV'11 written in ROS
 Improving the Kinect by Cross-Modal Stereo, Wei-Chen Chiu, Ulf Blanke, and Mario Fritz, 22nd British Machine Vision Conference (BMVC), (2011)
 I spy with my little eye: Learning Optimal Filters for Cross-Modal Stereo under Projected Patterns, Wei-Chen Chiu, Ulf Blanke, and Mario Fritz, 1st IEEE Workshop on Consumer Depth Cameras for Computer Vision (CDC4CV) in conjunction with International Conference on Computer Vision (ICCV), (2011)