Research


My primary research interests are computer vision and machine learning, with an emphasis on video analysis and its applications. I believe a working computer vision system should exploit three most fundamental constraints existing in a video sequence: the geometric constraint among video frames and the three-dimensional (3D) scene, the coherency in apparent image motions due to scene regularities, and the spatial-temporal statistical redundancy in visual appearance. In doing so, my recent research activities have been focused on the following three topics:

  • Geometric reconstruction: exploiting the geometric constraint among video frames and the 3D scene;

  • Layered video representation/analysis: exploiting the image motion coherency due to scene regularities;

  • Subspace learning: exploiting the spatial-temporal statistical redundancy in a video sequence.

The convergence of these research activities is my long term research goal---a computer vision system capable of reconstructing and interpreting the 3D world "seen" in the video. Such a vision system has rich applications in computer graphics and visualization, human computer interaction, robotics, navigation, smart cars, video retrieval/compression, and wearable visual assistant for humans.

Below are some descriptions of my research projects.
 

Geometric Reconstruction Vision for SMAV Layered Video Analysis  Subspace Learning Efficient BA for SFM Emulator  (network)  

Geometric Reconstruction

 

Videos showing feature tracking, camera motion estimation, and 3D estimation (format: Windows AVI):

 Some more results on object tracking:

 

Given measurements in 2D images, the goal of geometric reconstruction is to estimate the 3D information about the underlying scene and/or the camera motion. To achieve optimal geometric reconstruction, one needs to minimize the reconstruction errors measured in the image domain. Conventional approaches to geometric reconstruction suffers from the problem of being trapped in local optimum solutions; the reconstruction result is un-predictable and could be undesirable.

We find that for many geometric reconstruction problems (e.g., multi-view triangulation, camera resectioning, multi-view reconstruction with known rotations, planar homography estimation, etc) the reprojection error for a pin-hole camera is a quasiconvex function of unknown parameters. This quasiconvexity allows us to formulate these reconstruction problems as small-scale convex programs, such as Linear Programs (LP) or Second Order Cone Programs (SOCP).  In contrast to existing approaches, the quasiconvex optimization approach is deterministic and guarantees a global optimum solution that minimizes the maximum reconstruction error; one is assured the quality of the reconstruction. The quasiconvexity also enables an intuitive approach that can effectively handle directional uncertainties and outliers in image measurements.

Application: 3D vision for guidance and navigation of small and micro aerial vehicle (SMAV).   Autonomous control of small and micro air vehicles (SMAV) requires precise estimation of both vehicle state and its surrounding environment. Small cameras, which are available today at very low cost, are attractive sensors for SMAV.  The main challenges are low-quality input video data, mostly forward SMAV motion (a near-degenerate case for traditional structure from motion algorithms), and the requirement of on-line estimation of motion and 3D.  I have designed a robust feature tracker that handles difficult cases that are typical in videos from SMAV, including large motions, illumination changes, and image noises/distortions. The geometric reconstruction using convex optimization is applied for structure and motion estimation from tracked features.

Publications:

  • "Handling Uncertainties in Geometric Reconstruction Using Quasiconvex Optimization",
     Qifa Ke and Takeo Kanade,
     IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2006)
    , to appear, New York City, June 2006.
     
  • "Quasiconvex Optimization for Robust Geometric Reconstruction",
     Qifa Ke and Takeo Kanade,
     Tenth IEEE International Conference on Computer Vision (ICCV 2005), Beijing, China, October 2005.

[ back to top ]

 


3D Vision for Autonomous Small and Micro Air Vehicles



24-inch MAV


6-inch MAV

Autonomous control of small and micro air vehicles (SMAV) requires precise estimation of both vehicle state and its surrounding environment. Small cameras, which are available today at very low cost, are attractive sensors for SMAV. 3D vision by video and laser scanning has distinct advantages in that they provide positional information relative to objects and environments, in which the vehicle operates, that is critical to obstacle avoidance and mapping of the environment. We are working on real-time 3D vision algorithms for recovering motion and structure from a video sequence, 3D terrain mapping from a laser range finder onboard a small autonomous helicopter, and sensor fusion of visual and commoditized GPS/INS sensors. 

Publication:

 

[ back to top ]

 


Layer-Based Video Representation and Analysis

Layer extraction
 

  Sample videos of layer extraction:

Layer extraction exploits scene regularities to segment a video sequence into layers; in each layer pixels share some common (but unknown) motion model. Such layered representation explicitly represents occlusions and depth discontinuities---two most difficult issues in video analysis. It also provides one of the strongest cues for further segmenting the video sequence into separate objects. However, conventional methods to layer extraction exploit only the constraints from scene regularities; they either make strong assumptions about the scene, or require a good initial solution that is hard to obtain.

In addition to scene regularities, by exploiting another constraint, the geometric constraint among video frames and the scene, we showed that the high-dimensional image motions (e.g., the local affine/projective transformations collected from small image patches across multiple frames) must lie in a low-dimensional subspace. By projecting 2D image motions into the low-dimensional subspace, layers can be simply identified as compact clusters in the subspace. Moreover, the existence of subspace also enables us to detect outliers in local image motion measurements.


Publications:

[ back to top ]

 


Robust Subspace Learning and Clustering



(a) L2-norm (IRWLS)



(b) Robust L1-norm

Application: low-rank subspace(matrix factorization) for shape recovery.

In many computer vision and machine learning problems, high-dimensional observation data often form clusters, and at the same time, reside in some low dimensional subspace. Such subspace and clusters have many important applications, including object representation and recognition, motion estimation, unsupervised pattern classification, and collaborative filtering. Conventional subspace learning methods (e.g., PCA) are based on minimizing the L2-norm errors between the input data and the fitted subspace model. They fail to account for outliers or missing data components that are common in real data sets, nor do they utilize the constraints from the fact that data form clusters.

We have designed a novel robust subspace learning algorithm that (1) detects outliers by exploiting constraints from both subspace and clusters, and (2) minimizes L1-norm errors that are robust to outliers. First, we implicitly exploit clustering structure (but without actually clustering the data) to map the input data onto a one-dimensional space, where outliers are clearly distinguishable from inliers. Second, we formulate subspace learning as a L1-norm minimization problem. Unlike L2-norm, L1-norm is an error metric that is robust to outliers. We have developed an algorithm that uses alternative linear programming to efficiently minimize the L1-norm error. The linear program formulation also makes it straightforward to (1) account for constraints given by prior knowledge and (2) handle missing data components in measurements.
 

Publications:

 

[ back to top ]

 


Efficient Bundle-Adjustment for Structure from Motion

System overview: from image sequence to
3D reconstruction

We developed an efficient hierarchical approach to structure from motion for long image sequences. Our approach contains two key elements: accurate 3D reconstruction for each segment and efficient bundle adjustment for the whole sequence. The image sequence is first divided into a number of segments so that feature points can be reliably tracked across each segment. Each segment has a long baseline to ensure accurate 3D reconstruction. In order to efficiently bundle adjust 3D structures from all segments, we reduced the number of frames in each segment by introducing “virtual key frames”. The virtual frames encode the 3D structure of each segment along with its uncertainty but they form a small subset of the original frames. Our method achieves significant speedup over conventional bundle adjustment methods. 

Publications:

Related publication in SFM:

[ back to top ]

 


Emulator for Ad-Hoc Sensor Networks

Emulator: MNi is a computer (running real applications being evaluated) emulating a node in the ad-hoc sensor network simulated in the ns-2 server.

Sensor networks are often ad hoc since the sensors, when deployed, form a temporary network without any centralized administration. Evaluating a software system in such networks is a challenging task, as it requires either (1) building a real test-bed to deploy the software system, which is expensive and non-repeatable, or (2) re-implementing the software system inside existing network simulators, which is error-prone and infeasible for large-scale software systems. I have developed an emulation system capable of evaluating unmodified real software systems in simulated environments (ns-2 network simulator). The emulation runs in real-time, and is repeatable, detailed, and realistic. The emulator has been integrated into the widely-used ns-2 simulator, and is publicly available. 

Publication:


[ back to top ]      [ back to homepage ]