Abstract
The problem of ambiguous matches, or false targets, can greatly reduce the accuracy of a stereo vision system. A common approach to alleviating the problem is the use of a coarse to fine refinement strategy, but we show that this approach imposes some (perhaps overly) strong requirements on the stereo images. Our phase-based method relaxes those requirements, and is therefore able to handle a wider variety of otherwise ambiguous images. But sometimes ambiguity is inherent in the images, so we propose a generalized disparity model to explicitly represent multiple candidates.
Perspective foreshortening, an effect that occurs when a surface is viewed at a sharp angle, can reduce the precision of stereo methods. Many methods tacitly assume that the projection of an object will have the same area in both images, but this condition is violated by perspective foreshortening. We show how to overcome this problem using a local spatial frequency representation. A simple geometric analysis leads to an elegant solution in the frequency domain which, when applied to our phase-based system, increases the system's maximum matchable surface angle from 30 degrees to over 75 degrees.
The analysis of stereo vision algorithms can be greatly enhanced through the use of datasets with ground truth. We outline a taxonomy of datasets with ground truth that use varying degrees of realism to characterize particular aspects of stereo vision systems, and show that each component of this taxonomy can be effectively realized with current technology. We propose that datasets generated in this way be used as the foundation for a suite of statistical analyses to effectively characterize the performance of stereo vision systems.