We are developing a new approach called Voxel Coloring that reconstructs the "color" (radiance) at surface points in an unknown scene. Initially, we assume a static scene containing Lambertian surfaces under fixed illumination so the radiance from a scene point can be described simply by a scalar value, which we call color.
Coping with large visibility changes between images means solving the correspondence problem between images that are very different in appearance--a very difficult problem. Rather than use traditional methods such as stereo, we use a scene-based approach. That is, we represent the environment as a discretized set of voxels, and use an algorithm that traverses these voxels and colors those that are part of a surface in the scene. The advantage of this approach is that simple voxel projection determines candidates corresponding image pixels. A difficulty is that a given image pixel may not correspond to a particular projecting voxel if there is a closer voxel occluding the projecting voxel. For example, in the figure below the red voxel in the scene projects into the first and third images but not the second, because the blue voxel occludes it.
To solve this visibility problem we introduce a novel geometric constraint on the input camera positions that enables a single visibility ordering of the voxels to hold for every input viewpoint. This ordinal visibility constraint is satisfied whenever no scene point is contained within the convex hull of the input camera centers. Below are shown two simple camera configurations that satisfy this constraint. The left configuration shows a downward-facing camera that is moved 360 degrees around an object. The right configuration shows a rig of outward-facing cameras that are distributed around a sphere.
As a result of this camera constraint we can traverse voxels in increasing distance from the set of cameras. That is, we can first visit all voxels in the "layer" immediately adjacent to the convex hull of the cameras, then visit all voxels in the next layer immediately adjacent to the first layer, and so on. In this way, when a voxel is visited all other voxels that could possibly occlude the current one have already been visited and colored. Hence it is easy to check for each image whether or not the current voxel projects into (i.e., is visible in) the image. The following figure illustrates the voxel layers for one simple camera configuration.
Scene reconstruction is complicated by the fact that a set of images can be consistent with more than one rigid scene. Determining a scene's spatial occupancy is therefore an ill-posed problem because a voxel contained in one consistent scene may not be contained in another. On the other hand, a voxel may be part of two different consistent scenes, but have different colors in each. To cope with this problem we say a voxel V is color invariant with respect to a set of images if, for every pair of voxelizations S and T that contain V and that are consistent with the images, we have color(V, S) = color(V, T). Using this invariant, we define a voxel coloring of a set of images to be the maximally consistent coloring.
We can now define the complete voxel coloring algorithm as:
S={} /* initial set of colored voxels is empty for i = 1 to r do /* traverse each of r layers foreach V in the ith layer of voxels do project V into all images where V is visible if sufficient correlation of the pixel colors then add V to S
Click on any of the images to view a larger version of each.
Input View | Output Views | |
---|---|---|
Input View | Output Views | |
---|---|---|
The last experiment consisted of a synthetic room scene containing three textured walls, a bust of Beethoven, and a human figure, illuminated diffusely from above. 24 images were synthesized corresponding to camera positions inside the room, with the optical axes of all the cameras parallel to the floor of the room. From these 24 input images the voxel coloring algorithm was applied to reconstruct this highly concave scene. The scene was decomposed by the algorithm into 2.3M voxels corresponding to a 150 x 125 x 125 resolution partitioning. The result of the algorithm was a coloring of 78,158 voxels representing the scene. The images below on the left are views synthesized directly from the room scene model (but do not correspond to any of the 24 input images), and the images on the right correspond to the same views, but synthesized from the voxel reconstruction produced by our algorithm.
Views Synthesized Directly from Model | Corresponding Views Produced by Our Algorithm |
---|---|
Last modified: August 20, 1998