Research

Research

My primary research interests are computer vision and machine learning, with an emphasis on video analysis and its applications. I believe a working computer vision system should exploit three most fundamental constraints existing in a video sequence: the geometric constraint among video frames and the three-dimensional (3D) scene, the coherency in apparent image motions due to scene regularities, and the spatial-temporal statistical redundancy in visual appearance. In doing so, my recent research activities have been focused on the following three topics:

Geometric reconstruction: exploiting the geometric constraint among video frames and the 3D scene;

Layered video representation/analysis: exploiting the image motion coherency due to scene regularities;

Subspace learning: exploiting the spatial-temporal statistical redundancy in a video sequence.

The convergence of these research activities is my long term research goal---a computer vision system capable of reconstructing and interpreting the 3D world "seen" in the video. Such a vision system has rich applications in computer graphics and visualization, human computer interaction, robotics, navigation, smart cars, video retrieval/compression, and wearable visual assistant for humans.

Below are some descriptions of my research projects.

Geometric Reconstruction

Vision for SMAV

Layered Video Analysis

Subspace Learning

Efficient BA for SFM

Emulator (network)

Geometric Reconstruction

Videos showing feature tracking, camera motion estimation, and 3D estimation (format: Windows AVI):

Office
Road
Corridor
Flight (taken from a flying MAV)

Some more results on object tracking:

Van tracking
City flag Tracking (non-rigid and occluded, wmv file)

Given measurements in 2D images, the goal of geometric reconstruction is to estimate the 3D information about the underlying scene and/or the camera motion. To achieve optimal geometric reconstruction, one needs to minimize the reconstruction errors measured in the image domain. Conventional approaches to geometric reconstruction suffers from the problem of being trapped in local optimum solutions; the reconstruction result is un-predictable and could be undesirable.

We find that for many geometric reconstruction problems (e.g., multi-view triangulation, camera resectioning, multi-view reconstruction with known rotations, planar homography estimation, etc) the reprojection error for a pin-hole camera is a quasiconvex function of unknown parameters. This quasiconvexity allows us to formulate these reconstruction problems as small-scale convex programs, such as Linear Programs (LP) or Second Order Cone Programs (SOCP). In contrast to existing approaches, the quasiconvex optimization approach is deterministic and guarantees a global optimum solution that minimizes the maximum reconstruction error; one is assured the quality of the reconstruction. The quasiconvexity also enables an intuitive approach that can effectively handle directional uncertainties and outliers in image measurements.

Application: 3D vision for guidance and navigation of small and micro aerial vehicle (SMAV). Autonomous control of small and micro air vehicles (SMAV) requires precise estimation of both vehicle state and its surrounding environment. Small cameras, which are available today at very low cost, are attractive sensors for SMAV. The main challenges are low-quality input video data, mostly forward SMAV motion (a near-degenerate case for traditional structure from motion algorithms), and the requirement of on-line estimation of motion and 3D. I have designed a robust feature tracker that handles difficult cases that are typical in videos from SMAV, including large motions, illumination changes, and image noises/distortions. The geometric reconstruction using convex optimization is applied for structure and motion estimation from tracked features.

Publications:

"Handling Uncertainties in Geometric Reconstruction Using Quasiconvex Optimization",
Qifa Ke and Takeo Kanade,
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2006), to appear, New York City, June 2006.
"Quasiconvex Optimization for Robust Geometric Reconstruction",
Qifa Ke and Takeo Kanade,
Tenth IEEE International Conference on Computer Vision (ICCV 2005), Beijing, China, October 2005.

[ back to top ]

3D Vision for Autonomous Small and Micro Air Vehicles

24-inch MAV

6-inch MAV

Autonomous control of small and micro air vehicles (SMAV) requires precise estimation of both vehicle state and its surrounding environment. Small cameras, which are available today at very low cost, are attractive sensors for SMAV. 3D vision by video and laser scanning has distinct advantages in that they provide positional information relative to objects and environments, in which the vehicle operates, that is critical to obstacle avoidance and mapping of the environment. We are working on real-time 3D vision algorithms for recovering motion and structure from a video sequence, 3D terrain mapping from a laser range finder onboard a small autonomous helicopter, and sensor fusion of visual and commoditized GPS/INS sensors.

Publication:

"Real-Time and 3D Vision for Autonomous Small and Micro Air Vehicles",
Takeo Kanade, Omead Amidi, and Qifa Ke,
invited paper, IEEE Conf. on Decision and Control (CDC 2004), Dec. 2004.

[ back to top ]

Layer-Based Video Representation and Analysis

Layer extraction

Sample videos of layer extraction:

Garden sequence: input, result.
Mobile sequence: input, result.

Layer extraction exploits scene regularities to segment a video sequence into layers; in each layer pixels share some common (but unknown) motion model. Such layered representation explicitly represents occlusions and depth discontinuities---two most difficult issues in video analysis. It also provides one of the strongest cues for further segmenting the video sequence into separate objects. However, conventional methods to layer extraction exploit only the constraints from scene regularities; they either make strong assumptions about the scene, or require a good initial solution that is hard to obtain.

In addition to scene regularities, by exploiting another constraint, the geometric constraint among video frames and the scene, we showed that the high-dimensional image motions (e.g., the local affine/projective transformations collected from small image patches across multiple frames) must lie in a low-dimensional subspace. By projecting 2D image motions into the low-dimensional subspace, layers can be simply identified as compact clusters in the subspace. Moreover, the existence of subspace also enables us to detect outliers in local image motion measurements.

Publications:

"A Subspace Approach to Layer Extraction",
Qifa Ke and Takeo Kanade,
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2001), Volume I, pages 255-262, Hawaii, Dec. 2001.
"A Robust Subspace Approach to Layer Extraction", (pdf, postscript)
Qifa Ke and Takeo Kanade,
IEEE Workshop on Motion and Video Computing (Motion 2002), pages 37-43, Orlando, Florida, Dec. 2002.
Lockheed-Martin Best Paper Award.
"Transforming Camera Geometry to A Virtual Downward-Looking Camera: Robust Ego-Motion Estimation and Ground-Layer Detection",
Qifa Ke and Takeo Kanade,
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2003), Volume I, pages 390-397, Madison, June 2003.
"Textureless Layers",
Qifa Ke, Simon Baker, and Takeo Kanade,
Tech. Report CMU-RI-TR-04-17, Robotics Institute, Carnegie Mellon University, March 2004.

[ back to top ]

Robust Subspace Learning and Clustering

(a) L₂-norm (IRWLS)

(b) Robust L₁-norm

Application: low-rank subspace(matrix factorization) for shape recovery.

In many computer vision and machine learning problems, high-dimensional observation data often form clusters, and at the same time, reside in some low dimensional subspace. Such subspace and clusters have many important applications, including object representation and recognition, motion estimation, unsupervised pattern classification, and collaborative filtering. Conventional subspace learning methods (e.g., PCA) are based on minimizing the L₂-norm errors between the input data and the fitted subspace model. They fail to account for outliers or missing data components that are common in real data sets, nor do they utilize the constraints from the fact that data form clusters.

We have designed a novel robust subspace learning algorithm that (1) detects outliers by exploiting constraints from both subspace and clusters, and (2) minimizes L₁-norm errors that are robust to outliers. First, we implicitly exploit clustering structure (but without actually clustering the data) to map the input data onto a one-dimensional space, where outliers are clearly distinguishable from inliers. Second, we formulate subspace learning as a L₁-norm minimization problem. Unlike L₂-norm, L₁-norm is an error metric that is robust to outliers. We have developed an algorithm that uses alternative linear programming to efficiently minimize the L₁-norm error. The linear program formulation also makes it straightforward to (1) account for constraints given by prior knowledge and (2) handle missing data components in measurements.

Publications:

"Robust L₁ Norm Factorization in the Presence of Outliers and Missing Data by Alternative Convex Programming",
Qifa Ke and Takeo Kanade,
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005), San Diego, CA, June 2005.
"Robust Subspace Clustering by Combined Use of kNND Metric and SVD Algorithm",
Qifa Ke and Takeo Kanade,
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2004), Washington D.C., June 2004.

[ back to top ]

Efficient Bundle-Adjustment for Structure from Motion

System overview: from image sequence to
3D reconstruction

We developed an efficient hierarchical approach to structure from motion for long image sequences. Our approach contains two key elements: accurate 3D reconstruction for each segment and efficient bundle adjustment for the whole sequence. The image sequence is first divided into a number of segments so that feature points can be reliably tracked across each segment. Each segment has a long baseline to ensure accurate 3D reconstruction. In order to efficiently bundle adjust 3D structures from all segments, we reduced the number of frames in each segment by introducing “virtual key frames”. The virtual frames encode the 3D structure of each segment along with its uncertainty but they form a small subset of the original frames. Our method achieves significant speedup over conventional bundle adjustment methods.

Publications:

"Efficient Bundle Adjustment with Virtual Key Frames: A Hierarchical Approach to Multi-Frame Structure from Motion",
Heung-Yeung Shum, Qifa Ke, and Zhengyou Zhang
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 1999), Colorado, June 1999

Related publication in SFM:

"Recovering Epipolar Geometry by Reactive Tabu Search",
Qifa Ke, Gang Xu, and Songde Ma,
Sixth IEEE International Conference on Computer Vision (ICCV 1998), January 1998.

[ back to top ]

Emulator for Ad-Hoc Sensor Networks

Emulator: MNi is a computer (running real applications being evaluated) emulating a node in the ad-hoc sensor network simulated in the ns-2 server.

Sensor networks are often ad hoc since the sensors, when deployed, form a temporary network without any centralized administration. Evaluating a software system in such networks is a challenging task, as it requires either (1) building a real test-bed to deploy the software system, which is expensive and non-repeatable, or (2) re-implementing the software system inside existing network simulators, which is error-prone and infeasible for large-scale software systems. I have developed an emulation system capable of evaluating unmodified real software systems in simulated environments (ns-2 network simulator). The emulation runs in real-time, and is repeatable, detailed, and realistic. The emulator has been integrated into the widely-used ns-2 simulator, and is publicly available.

Publication:

"Emulation of Multi-Hop Wireless Ad Hoc Networks",
Qifa Ke, Dave Maltz, and David B. Johnson,
The 7th International Workshop on Mobile Multimedia Communications (MoMuC 2000), Tokyo, Japan, Oct. 2000.

[ back to top ] [ back to homepage ]