Overview
Early and intermediate level visual processing
can be modeled as a multi stage process. The image is first processed by spatiotemporal
receptive fields tuned to orientation, spatial frequency, opponent color, and short-range
motion. This is followed by a grouping stage resulting in the formation of regions of
coherent brightness, color and texture. Call these 'proto-surfaces'. We model this as a
process of finding a partition of the image into regions such that there is high
similarity within a region and low similarity across regions. This is made precise as the Normalized
cut criterion which can be optimized by solving a generalized eignevalue problem. The
resulting eigenvectors provide a herarchical partitioning of the image into regions
ordered according to salience. Brightness, color, texture, motion similarity, proximity
and good continuation can all be encoded into this framework. We have demonstrated results
using these multiple cues for segmenting arbitrary gray-level images.
We believe these results form a good substrate for work on object recognition as well
as figure-ground processing.
|
|
RELEVANT
PUBLICATIONS
Textons, Contours and Regions: Cue Combination in Image Segmentation
J. Malik, S. Belongie, J. Shi and T. Leung
[International Conference on Computer Vision, September 1999] Paper.
This paper makes two contributions. It provides (1) an operational definition of textons,
the putative elementary units of texture perception, and (2) an algorithm for partitioning
the image into disjoint regions of coherent brightness and texture, where boundaries of
regions are defined by peaks in contour orientation energy and differences in texton
densities across the contour.
Julesz introduced the term texton, analogous to a phoneme in speech recognition, but
did not provide an operational definition for gray-level images. Here we re-invent textons
as frequently co-occurring combinations of oriented linear filter outputs. These can be
learned using a K-means approach. By mapping each pixel to its nearest texton, the image
can be analyzed into texton channels, each of which is a point set where discrete
techniques such as Voronoi diagrams become applicable.
Local histograms of texton frequencies can be used with a chi squared test for
significant differences to find texture boundaries. Natural images contain both textured
and untextured regions, so we combine this cue with that of the presence of peaks of
contour energy derived from outputs of odd- and even-symmetric oriented Gaussian
derivative filters. Each of these cues has a domain of applicability, so to facilitate cue
combination we introduce a gating operator based on a statistical test for isotropy of
Delaunay neighbors. Having obtained a local measure of how likely two nearby pixels are to
belong to the same region, we use the spectral graph theoretic framework of normalized
cuts to find partitions of the image into regions of coherent texture and brightness.
Experimental results on a wide range of images are shown.
Normalized Cuts and Image Segmentation
J. Shi and J. Malik
[IEEE Conf. Computer Vision and Pattern Recognition, June 1997] Paper.
We propose a novel approach for solving the perceptual grouping problem in vision. Rather
than focusing on local features and their consistencies in the image data, our approach
aims at extracting the global impression of an image. We treat image segmentation as a
graph partitioning problem and propose a novel global criterion, the normalized cut,
for segmenting the graph. The normalized cut criterion measures both the total
dissimilarity between the different groups as well as the total similarity within the
groups. We show that an efficient computational technique based on a generalized
eigenvalue problem can be used to optimize this criterion. We have applied this approach
to segmenting static images and found results very encouraging.
|
|
Motion Segmentation and Tracking Using Normalized Cuts
J. Shi and J. Malik
[ International Conference on Computer Vision, January 1998] Paper.
We propose a motion segmentation algorithm that aims to break a scene into its most
prominent moving groups. A weighted graph is constructed on the image sequence by
connecting pixels that are in the spatiotemporal neighborhood of each other. At each
pixel, we define motion profile vectors which capture the probability distribution of the
image velocity. The distance between motion profiles is used to assign a weight on the
graph edges. Using normalized cuts we find the most salient partitions of the
spatiotemporal graph formed by the image sequence. For segmenting long image sequences, we
have developed a recursive update procedure that incorporates knowledge of segmentation in
previous frames for efficiently finding the group correspondence in the new frame.
|
Contour continuity in region based image segmentation
T. Leung and J. Malik
[Fifth European Conference on Computer Vision, June 1998] Paper.
Region-based image segmentation techniques make use of similarity in intensity, color and
texture to determine the partitioning of an image. The powerful cue of contour continuity
is not exploited at all. In this paper, we provide a way of incorporating curvilinear
grouping into region-based image segmentation. Soft contour information is obtained
through orientation energy. Weak contrast gaps and subjective contours are completed by
contour propagation. The normalized cut approach proposed by Shi and Malik is used for the
segmentation. Results on a large variety of images are shown.
|
Finding Boundaries in Natural Images: A New Method Using Point Descriptors and Area
Completion
S. Belongie and J. Malik
[Fifth European Conference on Computer Vision, June 1998] Paper.
There are several reasons why a satisfactory solution to image segmentation for natural
scenes has remained elusive. Perhaps the foremost of these is image texture. One general
methodology which shows promise for solving this problem is to characterize textured
regions via their responses to a set of filters. However, this approach brings with it
many open questions, including how to combine texture and intensity information into a
common descriptor and how to deal with the fact that filter responses inside textured
regions are generally spatially inhomogeneous. Our goal in this work is to introduce two
new ideas which address these open questions and to demonstrate the application of these
ideas to the segmentation of natural images. The first idea consists of a novel means of
describing points in natural images and an associated distance function for comparing
these descriptors. This distance function is aided in textured regions by the use of the
second idea, a new process introduced here which we have termed area completion.
Experimental segmentation results which incorporate our proposed approach are provided for
a variety of natural images.
|
Self Inducing Relational Distance and its Application to Image Segmentation
J. Shi and J. Malik
[Fifth European Conference on Computer Vision, June 1998] Paper
We propose a new feature distance which is derived from an optimal relational graph
matching criterion. Instead of defining an arbitrary similarity measure for grouping, we
will use the criterion of reducing instability in the relational graph to induce a
similarity measure. This similarity measure not only improves the stability of the
matching, but more importantly, also captures the relative importance of relational
similarity in the feature space for the purpose of grouping. We will call this similarity
measure the self-induced relational distance. We demonstrate the distance measure
on a brightness-texture feature space and apply it to the segmentation of complex natural
images.
|
|