My primary research interests are computer vision and
machine
learning, with an emphasis on video analysis and its applications.
I believe a working computer vision system should exploit three
most fundamental constraints existing in a video sequence: the
geometric constraint among video frames and the three-dimensional
(3D) scene, the coherency in apparent image motions due to scene
regularities, and the spatial-temporal statistical redundancy in
visual appearance. In doing so, my recent research activities have
been focused on the following three topics:
|
|
|
The convergence of these research activities is my long term
research goal---a computer vision system capable of reconstructing
and interpreting the 3D world "seen" in the video. Such a vision
system has rich applications in computer graphics and
visualization, human computer interaction, robotics, navigation,
smart cars, video retrieval/compression, and wearable visual
assistant for humans.
Below are some descriptions of my research projects.
Geometric Reconstruction | Vision for SMAV | Layered Video Analysis | Subspace Learning | Efficient BA for SFM | Emulator (network) |
Videos showing feature tracking, camera motion estimation, and 3D estimation (format: Windows AVI): Some more results on object tracking:
|
Given measurements in 2D images, the goal of
geometric reconstruction is to estimate the 3D information about the
underlying scene and/or the camera motion. To achieve optimal geometric
reconstruction, one needs to minimize the reconstruction errors measured
in the image domain. Conventional approaches to geometric reconstruction
suffers from the problem of being trapped in local optimum solutions;
the reconstruction result is un-predictable and could be undesirable. We find that for many geometric reconstruction problems (e.g., multi-view triangulation, camera resectioning, multi-view reconstruction with known rotations, planar homography estimation, etc) the reprojection error for a pin-hole camera is a quasiconvex function of unknown parameters. This quasiconvexity allows us to formulate these reconstruction problems as small-scale convex programs, such as Linear Programs (LP) or Second Order Cone Programs (SOCP). In contrast to existing approaches, the quasiconvex optimization approach is deterministic and guarantees a global optimum solution that minimizes the maximum reconstruction error; one is assured the quality of the reconstruction. The quasiconvexity also enables an intuitive approach that can effectively handle directional uncertainties and outliers in image measurements. Application: 3D vision for guidance and navigation of small and micro aerial vehicle (SMAV). Autonomous control of small and micro air vehicles (SMAV) requires precise estimation of both vehicle state and its surrounding environment. Small cameras, which are available today at very low cost, are attractive sensors for SMAV. The main challenges are low-quality input video data, mostly forward SMAV motion (a near-degenerate case for traditional structure from motion algorithms), and the requirement of on-line estimation of motion and 3D. I have designed a robust feature tracker that handles difficult cases that are typical in videos from SMAV, including large motions, illumination changes, and image noises/distortions. The geometric reconstruction using convex optimization is applied for structure and motion estimation from tracked features. Publications:
|
[ back to top ]
3D Vision for Autonomous Small and Micro Air Vehicles
|
Autonomous control of small and micro air vehicles (SMAV)
requires precise estimation of both vehicle state and its surrounding
environment. Small cameras, which are available today at very low cost,
are attractive sensors for SMAV. 3D vision by video and laser scanning
has distinct advantages in that they provide positional information
relative to objects and environments, in which the vehicle operates,
that is critical to obstacle avoidance and mapping of the environment.
We are working on real-time 3D vision algorithms for recovering motion
and structure from a video sequence, 3D terrain mapping from a laser
range finder onboard a small autonomous helicopter, and sensor fusion of
visual and commoditized GPS/INS sensors. Publication:
|
[ back to top ]
Layer-Based Video Representation and Analysis
Layer extraction Sample videos of layer extraction: |
Layer extraction exploits scene regularities to
segment a video sequence into layers; in each layer pixels share some
common (but unknown) motion model. Such layered representation
explicitly represents occlusions and depth discontinuities---two most
difficult issues in video analysis. It also provides one of the
strongest cues for further segmenting the video sequence into separate
objects. However, conventional methods to layer extraction exploit only
the constraints from scene regularities; they either make strong
assumptions about the scene, or require a good initial solution that is
hard to obtain. In addition to scene regularities, by exploiting another constraint, the geometric constraint among video frames and the scene, we showed that the high-dimensional image motions (e.g., the local affine/projective transformations collected from small image patches across multiple frames) must lie in a low-dimensional subspace. By projecting 2D image motions into the low-dimensional subspace, layers can be simply identified as compact clusters in the subspace. Moreover, the existence of subspace also enables us to detect outliers in local image motion measurements.
|
[ back to top ]
Robust Subspace Learning and Clustering
|
In many computer vision and machine learning
problems, high-dimensional observation data often form clusters, and at
the same time, reside in some low dimensional subspace. Such subspace
and clusters have many important applications, including object
representation and recognition, motion estimation, unsupervised pattern
classification, and collaborative filtering. Conventional subspace
learning methods (e.g., PCA) are based on minimizing the L2-norm
errors between the input data and the fitted subspace model. They fail
to account for outliers or missing data components that are common in
real data sets, nor do they utilize the constraints from the fact that
data form clusters. We have designed a novel robust subspace learning algorithm that (1) detects outliers by exploiting constraints from both subspace and clusters, and (2) minimizes L1-norm errors that are robust to outliers. First, we implicitly exploit clustering structure (but without actually clustering the data) to map the input data onto a one-dimensional space, where outliers are clearly distinguishable from inliers. Second, we formulate subspace learning as a L1-norm minimization problem. Unlike L2-norm, L1-norm is an error metric that is robust to outliers. We have developed an algorithm that uses alternative linear programming to efficiently minimize the L1-norm error. The linear program formulation also makes it straightforward to (1) account for constraints given by prior knowledge and (2) handle missing data components in measurements. Publications:
|
[ back to top ]
Efficient Bundle-Adjustment for Structure from Motion
System overview:
from image sequence to |
We
developed an efficient hierarchical approach to structure from motion
for long image sequences. Our approach contains two key elements:
accurate 3D reconstruction for each segment and efficient bundle
adjustment for the whole sequence. The image sequence is first divided
into a number of segments so that feature points can be reliably tracked
across each segment. Each segment has a long baseline to ensure accurate
3D reconstruction. In order to efficiently bundle adjust 3D structures
from all segments, we reduced the number of frames in each segment by
introducing “virtual key frames”. The virtual frames encode the
3D structure of each segment along with its uncertainty but they form a
small subset of the original frames. Our method achieves significant
speedup over conventional bundle adjustment methods. Publications:
Related publication in SFM:
|
[ back to top ]
Emulator for Ad-Hoc Sensor Networks
Emulator: MNi is a computer (running real applications being evaluated) emulating a node in the ad-hoc sensor network simulated in the ns-2 server. |
Sensor networks
are often ad hoc since the sensors, when deployed, form a temporary
network without any centralized administration. Evaluating a software
system in such networks is a challenging task, as it requires either (1)
building a real test-bed to deploy the software system, which is
expensive and non-repeatable, or (2) re-implementing the software system
inside existing network simulators, which is error-prone and infeasible
for large-scale software systems. I have developed an emulation system
capable of evaluating unmodified real software systems in
simulated environments (ns-2 network simulator). The emulation
runs in real-time, and is repeatable, detailed, and realistic. The
emulator has been integrated into the widely-used ns-2 simulator,
and is publicly available. Publication:
|
[ back to top ] [ back to homepage ]