Abstract
The simultaneous recovery of 3D shape and motion from image sequences is one
of the more difficult problems in computer vision. Classical approaches to the
problem rely on using algebraic techniques to solve for these unknowns given
two or more images. More recently, a batch analysis of image streams
(the temporal tracks of distinguishable image features) under orthography has
resulted in highly accurate reconstructions. We generalize this by using a
nonlinear least squares technique. While our approach requires iteration, it
quickly converges to the desired optimal solution, even in the absence of a
priori knowledge about the shape or motion. Important features of the
algorithm include its ability to handle partial point tracks, to use line
segment matches and point matches simultaneously, and to use an
object-centered representation for faster and more accurate structure and
motion recovery. We also discuss how a projective (as opposed to scaled rigid)
structure can be recovered when the camera calibration parameters are unknown.