Automatic modeling

Automatic 3D modeling from reality
Overview
Thesis Research	Robotics Institute
Daniel Huber	Carnegie Mellon University

Overview
	Evaluating hypotheses Finding the solution
Results
	Large DB tests Generality Accuracy
Publications
Video

The problem

Have you ever snapped a series of photos of some scenic landscape and then stitched the pictures together to create a panorama? Imagine if you could do the same thing in 3D. The technology to take 3D pictures is readily available in the form of laser range scanners and stereo camera systems, and the price for such sensors is falling. Unfortunately, the process of creating 3D models from a set of 3D pictures (a.k.a. range images) is still rather difficult, requiring mechanical measurement of the camera positions or tedious manual alignment of the 3D pictures. This research introduces an alternative - a fully automated approach that we call automatic modeling from reality. Using our algorithms, it is possible to take a collection of range images of a scene and automatically produce a realistic, geometrically accurate digital 3D model. One key benefit of our approach is that you do not need to record the position of the camera when taking the pictures. Also, it is not necessary to ensure that each picture overlaps with the previous one. The input images are assumed to be taken from arbitrary, unknown viewpoints and in an arbitrary, unknown order.

The largest barrier to automation of the modeling-from-reality process is the alignment (registration) of the individual 3D pictures (which we call views). The registration problem is like solving a 3D jigsaw puzzle. The 3D views are the pieces, and the problem is to assemble them to recreate the original scene without even knowing what the scene is supposed to look like.

Our approach

We solve the problem by first registering pairs of views using a pair-wise surface matching algorithm. The goal of this step is to find potential pair-wise matches (overlapping pairs of views and the transforms that register them). If two views contain overlapping scene regions, pair-wise surface matching often finds the correct registration. But sometimes the algorithm fails, and sometimes it is impossible to tell whether a pair-wise registration is correct just by looking at the two views. However, by assembling several views, the incorrect pair-wise matches can often be identified by the interaction between multiple overlapping surfaces. Using the jigsaw puzzle analogy, sometimes a piece can be inserted incorrectly in a puzzle because it fits well with another piece. Later, when the remaining surrounding pieces are added, the mistake is discovered because the piece does not fit on all sides.

Theoretically, we formulate the problem as a graph optimization. We use a graph called the model graph, in which the nodes represent the input views and the edges are the candidate matches produced by pair-wise surface matching. If we can connect all the views in this model graph using only correct matches, then it is straightforward to concatenate the underlying transforms to register all the views in one coordinate system, thereby solving our problem. We call this set of matches that connects the views a model hypothesis. For several reasons (which are described in the dissertation), we restrict our model hypotheses to be spanning trees of the model graph (i.e., they have no loops).

Example model graph		Example model hypothesis (with 3 parts)

One potential problem is that there may be no set of correct matches that connects all the views. This can happen if a view is corrupted or if one subset of views does not match correctly with any views in another subset (e.g., the front and back views of an object). We address this problem by explicitly representing the possibility that the best model hypothesis might not actually connect all of the views. In our hypothesis representation, an edge can be marked "hidden" (dashed line), in which case the corresponding 3D model is separated into two parts with the views on each side of the split going into different parts.

Using this framework, our problem has been reduced to two fundamental questions: 1) What constitutes a "good'' model hypothesis? and 2) How do we robustly and efficiently search for the best model hypothesis in an extremely large hypothesis space?

Next: What constitutes a good model hypothesis?