Recently, the optimal distance measure for a given object discrimination task under the nearest neighbor framework was derived. For ease of implementation and efficiency considerations, the optimal distance measure was approximated by combining more elementary distance measures defined on simple feature spaces. In this paper, we address two important issues that arise in practice for such an approach: (a) What form should the elementary distance measure in each feature space take? We motivate the need to use optimal distance measures in simple feature spaces as the elementary distance measures; such distance measures have the desirable property that they are invariant to distance-respecting transformations. (b) How do we combine the elementary distance measures? We present the precise statistical assumptions under which a linear logistic model holds exactly. We benchmark our model with three other methods on a challenging face discrimination task and show that our approach is competitive with the state of the art.
Using a saliency measure based on the global property of contour closure, we have developed a segmentation method which identifies smooth closed contours bounding objects of unknown shape in real images. The saliency measure incorporates the Gestalt principles of proximity and good continuity that previous methods have also exploited. Unlike previous methods, we incorporate contour closure by finding the eigenvector with the largest positive real eigenvalue of a transition matrix for a Markov process where edges from the image serve as states. Element (i, j) of the transition matrix is the conditional probability that a contour which contains edge j will also contain edge i. In this paper, we show how the saliency measure, defined for individual edges, can be used to derive a saliency relation, defined for pairs of edges, and further show that strongly-connected components of the graph representing the saliency relation correspond to smooth closed contours in the image. Finally, we report for the first time, results on large real images for which segmentation takes an average of about 10 seconds per object on a general-purpose workstation.
We develop a multi-class object detection framework whose core component is a nearest neighbor search over object part classes. The performance of the overall system is critically dependent on the distance measure used in the nearest neighbor search. A distance measure that minimizes the mis-classification risk for the 1-nearest neighbor search can be shown to be the probability that a pair of input image measurements belong to different classes. In practice, we model the optimal distance measure using a linear logistic model that combines the discriminative powers of more elementary distance measures associated with a collection of simple to construct feature spaces like color, texture and local shape properties. Furthermore, in order to perform search over large training sets efficiently, the same framework was extended to find hamming distance measures associated with simple discriminators. By combining this discrete distance model with the continuous model, we obtain a hierarchical distance model that is both fast and accurate. Finally, the nearest neighbor search over object part classes was integrated into a whole object detection system and evaluated against an indoor detection task yielding good results.
We propose to combine simple discriminators for object discrimination under the maximum entropy framework or equivalently under the maximum likelihood framework for the exponential family. The duality between the maximum entropy framework and maximum likelihood framework allows us to relate two selection criteria for the discriminators that were proposed in the literature. We illustrate our approach by combining nearest prototype discriminators that are simple to implement and widely applicable as they can be constructed in any feature space with a distance function. For efficient run-time performance we adapt the work on ``alternating trees'' for multi-class discrimination tasks. We report results on a multi-class discrimination task in which significant gains in performance are seen by combining discriminators under our framework from a variety of easy to construct feature spaces.
We approach the task of object discrimination as that of learning efficient ``codes'' for each object class in terms of responses to a set of chosen discriminants. We formulate this approach in an energy minimization framework. The ``code'' is built incrementally by successively constructing discriminants that focus on pairs of training images of objects that are currently hard to classify. The particular discriminants that we use partition the set of objects of interest into two well-separated groups. We find the optimal discriminant as well as partition by formulating an objective criteria that measures the well-separateness of the partition. We derive an iterative solution that alternates between the solutions for two generalized eigenproblems, one for the discriminant parameters and the other for the indicator variables denoting the partition. We show how the optimization can easily be biased to focus on hard to classify pairs, which enables us to choose new discriminants one by one in a sequential manner. We validate our approach on a challenging face discrimination task using parts as features and show that it compares favorably with the performance of an eigenspace method.
The estimation of the projective structure of a scene from image correspondences can be formulated as the minimization of the mean-squared distance between predicted and observed image points with respect to the projection matrices, the scene point positions, and their depths. Since these unknowns are not independent, constraints must be chosen to ensure that the optimization process is well posed. This paper examines three plausible choices, and shows that the first one leads to the Sturm-Triggs projective factorization algorithm, while the other two lead to new provably-convergent approaches. Experiments with synthetic and real data are used to compare the proposed techniques to the Sturm-Triggs algorithm and bundle adjustment.
We propose an iterative method for the recovery of the projective structure and motion from multiple images. It has been recently noted [sturm96,triggs96] that by scaling the measurement matrix by the true projective depths, recovery of the structure and motion is possible by factorization. The reliable determination of the projective depths is crucial to the success of this approach. The previous approach recovers these projective depths using pairwise constraints among images. We first discuss a few important drawbacks with this approach. We then propose an iterative method where we simultaneously recover both the projective depths as well as the structure and motion that avoids some of these drawbacks by utilizing all of the available data uniformly. The new approach makes use of a subspace constraint on the projections of a 3D point onto an arbitrary number of images. The projective depths are readily determined by solving a generalized eigenvalue problem derived from the subspace constraint. We also formulate a dual subspace constraint on all the points in a given image, which can be used for verifying the projective geometry of a scene or object that was modeled. We prove the monotonic convergence of the iterative scheme to a local maximum. We show the robustness of the approach on both synthetic and real data despite large perspective distortions and varying initializations.
Using a saliency measure based on the global property of contour closure, we have developed a method that reliably segments out salient contours bounding unknown objects from real edge images. The measure also incorporates the Gestalt principles of proximity and smooth continuity that previous methods have exploited. Unlike previous measures, we incorporate contour closure by finding the eigen-solution associated with a stochastic process that models the distribution of contours passing through edges in the scene. The segmentation algorithm utilizes the saliency measure to identify multiple closed contours by finding strongly-connected components on an induced graph. The determination of strongly-connected components is a direct consequence of the property of closure. We report for the first time, results on large real images for which segmentation takes an average of about 10 secs per object on a general-purpose workstation. The segmentation is made efficient for such large images by exploiting the inherent symmetry in the task.
Many modeling tasks in computer vision. e.g. structure from motion, shape/reflectance from shading, filter synthesis have a low-dimensional intrinsic structure even though the dimension of the input data can be relatively large. We propose a simple but surprisingly effective iterative randomized algorithm that drastically reduces the time required for recovering the intrinsic structure. The computational cost depends only on the intrinsic dimension of the structure of the task. It is based on the recently proposed Cascade Basis Reduction (CBR) algorithm that was developed in the context of steerable filters. A key feature of our algorithm compared with CBR is that an arbitrary apriori basis for the task is not required. This allows us to extend the applicability of the algorithm to tasks beyond steerable filters. We prove the convergence for the new algorithm and show that in practice the new algorithm is much faster than CBR for the same modeling error. We demonstrate this speed-up for the construction of a steerable basis for Gabor filters. We also demonstrate the generality of the new algorithm by applying it to to an example from structure from motion without missing data.