These online papers and abstracts are listed in chronological order (most recent first).
Papers may be downloaded in Adobe Acrobat (pdf), postscript (ps), or gzip compressed
postscript (.gz). Pdf is generally the quickest to download.
Interactive Manipulation of Rigid Body Simulations
Jovan Popovic, S. M. Seitz, M. Erdmann Z Popovic, and A. Witkin,
Proc. SIGGRAPH , 2000, to appear.
(823K pdf).
Physical simulation of dynamic objects has become commonplace in computer graphics because it produces highly realistic animations. In this paradigm the animator provides few physical parameters such as the objects initial positions and velocities, and the simulator automatically generates realistic motions. The resulting motion, however, is difficult to control because even a small adjustment of the input parameters cant drastically affect the subsequent motion. Furthermore, the animator often wishes to change the end-result of the motion instead of the initial physical parameters. We describe a novel interactive technique for intuitive manipulation of rigid multi-body simulations. Using our system, the animator can select bodies at any time and simply drag them to desired locations. In response, the system computes the required physical parameters and simulates the resulting motion. Surface characteristics such as normals and elasticity coefficients can also be automatically adjusted to provide a greater range of feasible motions, if the animator so desires. Because the entire simulation editing process runs at interactive speeds, the animator can rapidly design complex physical animations that would be difficult to achieve with existing rigid body simulators.
Structure from Motion Without Correspondences
F. Dellaert S. M. Seitz C. E. Thorpe S. Thrun, Proc.
Computer Vision and Pattern Recognition Conf. (CVPR) , 2000, to appear.
(500K pdf).
A method is presented to recover 3D scene structure and camera motion from multiple images without the need for correspondence information. The problem is framed as finding the maximum likelihood structure and motion given only the 2D measurements, integrating over all possible assignments of 3D features to 2D measurements. This goal is achieved by means of an algorithm which iteratively refines a probability distribution over the set of all correspondence assignments. At each iteration a new structure from motion problem is solved, using as input a set of virtual measurements derived from this probability distribution. The distribution needed can be efficiently obtained by Markov Chain Monte Carlo sampling. The approach is cast within the framework of Expectation-Maximization, which guaran-tees convergence to a local maximizer of the likelihood. The algorithm works well in practice, as will be demonstrated using results on several real image sequences
Shape and Motion Carving in 6D
S. Vedula, S. Baker, S. Seitz, and T. Kanade, Proc. Computer Vision and
Pattern Recognition Conf. (CVPR) , 2000, to appear.
(560K pdf).
The motion of a non-rigid scene over time imposes more constraints on its structure than those derived from images at a single time instant alone. An algorithm is presented for simultaneously recovering dense scene shape and scene flow (i.e., the instantaneous 3D motion at every point in the scene). The algorithm operates by carving away hexels, or points in the 6D space of all possible shapes and flows that are inconsistent with the images captured at either time in-stant, or across time. The recovered shape is demonstrated to be more accurate than that recovered using images at a single time instant. Applications of the combined scene shape and flow include motion capture for animation, re-timing of videos, and non-rigid motion analysis.
A Theory of Shape by Space Carving
K. N. Kutulakos and S. M. Seitz, International Journal of Computer Vision, Marr
Prize Special Issue, 2000, to appear. Earlier version appeared in Proc.
Seventh International Conference on Computer Vision (ICCV) , 1999, pp. 307-314.
(1M pdf).
In this paper we consider the problem of computing the 3D shape of an unknown, arbitrarily-shaped scene from multiple photographs taken at known but arbitrarily-distributed viewpoints. By studying the equivalence class of all 3D shapes that reproduce the input photographs, we prove the existence of a special member of this class, the photo hull, that (1) can be computed directly from photographs of the scene, and (2) subsumes all other members of this class. We then give a provably-correct algorithm, called Space Carving, for computing this shape and present experimental results on complex real-world scenes. The approach is designed to (1) capture photorealistic shapes that accurately model scene appearance from a wide range of viewpoints, and (2) account for the complex interactions between occlusion, parallax, shading, and their view-dependent effects on scene-appearance.
Omnivergent Stereo
H.Y. Shum, A. Kalai, and S. M. Seitz, Proc. Seventh International Conference on
Computer Vision (ICCV) , 1999, To appear.
(1.2M pdf).
The notion of a virtual sensor for optimal 3D reconstruction is introduced. Instead of planar perspective images that collect many rays at a fixed viewpoint, omnivergent cameras collect a small number of rays at many different viewpoints. The resulting 2D manifold of rays are arranged into two multiple-perspective images for stereo reconstruction. We call such images omnivergent images and the process of reconstructing the scene from such images omnivergent stereo This procedure is shown to produce 3D scene models with minimal reconstruction error, due to the fact that for any point in the 3D scene, two rays with maximum vergence angle can be found in the omnivergent images. Furthermore, omnivergent images are shown to have horizontal epipolar lines, enabling the application of traditional stereo matching algorithms, without modification. Three types of omnivergent virtual sensors are presented: spherical omnivergent cameras, center-strip cameras and dual-strip cameras.
Implicit Representation and Scene Reconstruction from
Probability Density Functions
S. M. Seitz and P. Anandan, Proc. Computer Vision and Pattern Recognition Conf.,
1999, pp. 28-34. Earlier version appeared in Proc. DARPA Image Understanding
Workshop, Monterey, CA, 1998.
(120K pdf).
A technique is presented for representing linear features as probability density functions in two or three dimensions. Three chief advantages of this approach are (1) a unified representation and algebra for manipulating points, lines, and planes, (2) seamless incorporation of uncertainty information, and (3) a very simple recursive solution for maximum likelihood shape estimation. Applications to uncalibrated affine scene reconstruction are presented, with results on images of an outdoor environment.
What Do N Photographs Tell Us about 3D Shape?
K. N. Kutulakos and S. M. Seitz, TR680, Computer Science Dept., U. Rochester, January
1998.
(2.7M pdf).
In this paper we consider the problem of computing the 3D shape of an unknown, arbitrarily-shaped scene from multiple color photographs taken at known but arbitrarily-distributed viewpoints. By studying the equivalence class of all 3D shapes that reproduce the input photographs, we prove the existence of a special member of this class, the maximal photo-consistent shape, that (1) can be computed from an arbitrary volume that contains the scene, and (2) subsumes all other members of this class. We then give a provably-correct algorithm for computing this shape and present experimental results from applying it to the reconstruction of a real 3D scene from several photographs. The approach is specifically designed to (1) build 3D shapes that allow faithful reproduction of all input photographs, (2) resolve the complex interactions between occlusion, parallax, shading, and their effects on arbitrary collections of photographs of a scene, and (3) follow a "least commitment" approach to 3D shape recovery.
Plenoptic Image Editing
S. M. Seitz and K. N. Kutulakos, Proc. 6th Int. Conf. Computer Vision, 1998,
pp. 17-24.
(550K pdf, postscript, or 3.7M gzip'ed postscript).
Earlier version available as Technical Report 647, Computer Science Department, University
of Rochester, Rochester, NY, January 1997.
(
postscript or 3.7M
gzip'ed postscript)
This paper presents a new class of interactive image editing operations designed to maintain physical consistency between multiple images of a physical 3D object. The distinguishing feature of these operations is that edits to any one image propagate automatically to all other images as if the (unknown) 3D object had itself been modified. The approach is useful first as a power-assist that enables a user to quickly modify many images by editing just a few, and second as a means for constructing and editing image-based scene representations by manipulating a set of photographs. The approach works by extending operations like image painting, scissoring, and morphing so that they alter an object's plenoptic function in a physically-consistent way, thereby affecting object appearance from all viewpoints simultaneously. A key element in realizing these operations is a new volumetric decomposition technique for reconstructing an object's plenoptic function from an incomplete set of camera viewpoints.
Interactive walkthrough applications require rendering an observed scene from a continuous range of target viewpoints. Toward this end, a novel approach is introduced that processes a set of input images to produce photorealistic scene reprojections over a wide range of viewpoints. This is achieved by (1) acquiring calibrated input images that are distributed throughout a target range of viewpoints to be modeled, and (2) computing a 3D reconstruction that is consistent in projection with all of the input images. The method avoids image correspondence problems by working in a discretized scene space whose voxels are traversed in a fixed visibility ordering. This strategy takes full account of occlusions and enables reconstructions of panoramic scenes. Promising initial results are presented for a room walkthrough.
This thesis addresses the problem of synthesizing images of real scenes under three-dimensional transformations in viewpoint and appearance. Solving this problem enables interactive viewing of remote scenes on a computer, in which a user can move a virtual camera through the environment and virtually paint or sculpt objects in the scene. It is demonstrated that a variety of three-dimensional scene transformations can be rendered on a video display device by applying simple transformations to a set of basis images of the scene. The virtue of these transformations is that they operate directly on images and recover only the scene information that is required in order to accomplish the desired effect. Consequently, they are applicable in situations where accurate three-dimensional models are difficult or impossible to obtain.
A central topic is the problem of view synthesis, i.e., rendering images of a real scene from different camera viewpoints by processing a set of basis images. Towards this end, two algorithms are described that warp and resample pixels in a set of basis images to produce new images that are physically-valid, i.e., they correspond to what a real camera would see from the specified viewpoints. Techniques for synthesizing other types of transformations, e.g., non-rigid shape and color transformations, are also discussed. The techniques are found to perform well on a wide variety of real and synthetic images.
A basic question is uniqueness, i.e., for which views is the appearance of the scene uniquely determined from the information present in the basis views. An important contribution is a uniqueness result for the no-occlusion case, which proves that all views on the line segment between the two camera centers are uniquely determined from two uncalibrated views of a scene. Importantly, neither dense pixel correspondence nor camera information is needed. From this result, a view morphing algorithm is derived that produces high quality viewpoint and shape transformations from two uncalibrated images.
To treat the general case of many views, a novel voxel coloring framework is introduced that facilitates the analysis of ambiguities in correspondence and scene reconstruction. Using this framework, a new type of scene invariant, called color invariant, is derived, which provides intrinsic scene information useful for correspondence and view synthesis. Based on this result, an efficient voxel-based algorithm is introduced to compute reconstructions and dense correspondence from a set of basis views. This algorithm has several advantages, most notably its ability to easily handle occlusion and views that are arbitrarily far apart, and its usefulness for panoramic visualization of scenes. These factors also make the voxel coloring approach attractive as a means for obtaining high-quality three-dimensional reconstructions from photographs.
A novel scene reconstruction technique is presented, different from previous approaches in its ability to cope with large changes in visibility and its modeling of intrinsic scene color and texture information. The method avoids image correspondence problems by working in a discretized scene space whose voxels are traversed in a fixed visibility ordering. This strategy takes full account of occlusions and allows the input cameras to be far apart and widely distributed about the environment. The algorithm identifies a special set of invariant voxels which together form a spatial and photometric reconstruction of the scene, fully consistent with the input images. The approach is evaluated with images from both inward-facing and outward-facing cameras.
This paper presents a general framework for image-based analysis of 3D repeatingmotions that addresses two limitations in the state of the art. First, the assumption that a motion be perfectly even from one cycle to the next is relaxed. Real repeating motions tend not to be perfectly even, i.e., the length of a cycle varies through time because of physically important changes in the scene. A generalization of {\em period} is defined for repeating motions that makes this temporal variation explicit. This representation, called the period trace, is compact and purely temporal, describing the evolution of an object or scene without reference to spatial quantities such as position or velocity. Second, the requirement that the observer be stationary is removed. Observer motion complicates image analysis because an object that undergoes a 3D repeating motion will generally not produce a repeating sequence of images. Using principles of affine invariance, we derive necessary and sufficient conditions for an image sequence to be the projection of a 3D repeating motion, accounting for changes in viewpoint and other camera parameters. Unlike previous work in visual invariance, however, our approach is applicable to objects and scenes whose motion is highly non-rigid. Experiments on real image sequences demonstrate how the approach may be used to detect several types of purely temporal motion features, relating to motion trends and irregularities. Applications to athletic and medical motion analysis are discussed.
Photographs and paintings are limited in the amount of information they can convey due to their inherent lack of motion and depth. Using image morphing methods, it is now possible to add 2D motion to photographs by moving and blending image pixels in creative ways. We have taken this concept a step further by adding the ability to convey three-dimensional motions, such as scene rotations and viewpoint changes, by manipulating one or more photographs of a scene. The effect transforms a photograph or painting into an interactive visualization of the underlying object or scene in which the world may be rotated in 3D. Several potential applications of this technology are discussed, in areas such as virtual reality, image databases, and special effects.
This paper analyzes the conditions when a discrete set of images implicitly describes scene appearance for a continuous range of viewpoints. It is shown that two basis views of a static scene uniquely determine the set of all views on the line between their optical centers when a visibility constraint is satisfied. Additional basis views extend the range of predictable views to 2D or 3D regions of viewpoints. A simple scanline algorithm called view morphing is presented for generating these views from a set of basis images. The technique is applicable to both calibrated and uncalibrated images.
The question of which views may be inferred from a set of basis images is addressed. Under certain conditions, a discrete set of images implicitly describes scene appearance for a continuous range of viewpoints. In particular, it is demonstrated that two basis views of a static scene determine the set of all views on the line between their optical centers. Additional basis views further extend the range of predictable views to a two- or three-dimensional region of viewspace. These results are shown to apply under perspective projection subject to a generic visibility constraint called monotonicity. In addition, a simple scanline algorithm is presented for actually generating these views from a set of basis images. The technique, called view morphing may be applied to both calibrated and uncalibrated images. At a minimum, two basis views and their fundamental matrix are needed. Experimental results are presented on real images. This work provides a theoretical foundation for image-based representations of 3D scenes by demonstrating that perspective view synthesis is a theoretically well-posed problem.
Image morphing techniques can generate compelling 2D transitions between images. However, differences in object pose or viewpoint often cause unnatural distortions in image morphs that are difficult to correct manually. Using basic principles of projective geometry, this paper introduces a simple extension to image morphing that correctly handles 3D projective camera and scene transformations. The technique, called view morphing, works by prewarping two images prior to computing a morph and then postwarping the interpolated images. Because no knowledge of 3D shape is required, the technique may be applied to photographs and drawings, as well as rendered scenes. The ability to synthesize changes both in viewpoint and image structure affords a wide variety of interesting 3D effects via simple image transformations.
Image warping is a popular tool for smoothly transforming one image to another. ``Morphing'' techniques based on geometric image interpolation create compelling visual effects, but the validity of such transformations has not been established. In particular, does 2D interpolation of two views of the same scene produce a sequence of physically valid in-between views of that scene? In this paper, we describe a simple image rectification procedure which guarantees that interpolation does in fact produce valid views, under generic assumptions about visibility and the projection process. Towards this end, it is first shown that two basis views are sufficient to predict the appearance of the scene within a specific range of new viewpoints. Second, it is demonstrated that interpolation of the rectified basis images produces exactly this range of views. Finally, it is shown that generating this range of views is a theoretically well-posed problem, requiring neither knowledge of camera positions nor 3D scene reconstruction. A scanline algorithm for view interpolation is presented that requires only four user-provided feature correspondences to produce valid orthographic views. The quality of the resulting images is demonstrated with interpolations of real imagery.
A new technique is presented for computing 3D scene structure from point and line features in monocular image sequences. Unlike previous methods, the technique guarantees the completeness of the recovered scene, ensuring that every scene feature that is detected in each image is reconstructed. The approach relies on the presence of four or more reference features whose correspondences are known in all the images. Under an orthographic or affine camera model, the parallax of the reference features provides constraints that simplify the recovery of the rest of the visible scene. An efficient recursive algorithm is described that uses a unified framework for point and line features. The algorithm integrates the tasks of feature correspondence and structure recovery, ensuring that all reconstructible features are tracked. In addition, the algorithm is immune to outliers and feature-drift, two weaknesses of existing structure-from-motion techniques. Experimental results are presented for real images.
Real cyclic motions tend not to be perfectly even, i.e., the period varies slightly from one cycle to the next, because of physically important changes in the scene. A generalization of period is defined for cyclic motions that makes periodic variation explicit. This representation, called the period trace, is compact and purely temporal, describing the evolution of an object or scene without reference to spatial quantities such as position or velocity. By delimiting cycles and identifying correspondences across cycles, the period trace provides a means of temporally registering a cyclic motion. In addition, several purely temporal motion features are derived, relating to the nature and location of irregularities. Results are presented using real image sequences and applications to athletic and medical motion analysis are discussed.
Current approaches for detecting periodic motion assume a stationary camera and place limits on an object's motion. These approaches rely on the assumption that a periodic motion projects to a set of periodic image curves, an assumption that is invalid in general. Using affine-invariance, we derive necessary and sufficient conditions for an image sequence to be the projection of a periodic motion. No restrictions are placed on either the motion of the camera or the object. Our algorithm is shown to be provably-correct for noise-free data and is extended to be robust with respect to occlusions and noise. The extended algorithm is evaluated with real and synthetic image sequences.
Last Changed: July 9, 1999