MURI

Landmark Recognition and Environment Identification

Recognizing landmarks is a critical task for movile robots. Landmarks are used for building maps of unknown environments. In this context, the traditional recognition techniques based on strong geometric models cannot be used. Rather, models of landmarks must be built from observations using image-based techniques. This research addresses the issue of building image-based landmark descriptions from sequences of images and of recognizing those landmarks. Beyond its application to mobile robot navigation, this approach addresses the more general problem of identifying groups of images with common attributes in sequences of images. With the appropriate domain constraints and image descriptions, this can be done using efficient algorithms.

In a training stage, the system is given a set of images in sequence. The aim of the training is to organize these images into groups based on similarity of feature distributions between images. The algorithm tries to finds the most relevant groups, taking the global distribution of the images into account. In a second step, the system is given new images, which it tries to classify as one of the learned groups, or in the category of unrecognized images.

Distance matrix for a 145-images training sequence;
darker points correspond to lower distances, the right images shows the distance matrix for the first 50 images.

The basic representation is based on distributions of different feature characteristics. Two standard classes of attributes are used for describing the images: normalized red color and edge distributions. The distributions are computed for the whole image and for a set of sub-images. Tests are used to compare these distributions and define a distance between images. Because of the potentially wide variation in viewpoint, the images must be registered before comparing their feature distributions. This distance is then used to cluster the images into groups. Each group is then characterized by a set of feature distributions. When a new image are given to the system, the algorithm evaluates the distances between these images and the groups and classifies the new image in one of the groups or rejects it.

Color normalization:
(top) Original images; (bottom)Normalized images

Many of the existing approaches are typically used for building models of single object observed in isolation. In the case of landmark recognition, there is no practical way to isolate the object in order to build models. Worse, it is often not known in advance which of the objects observed in the environment would constitute good landmarks. An additional complication is the potentially wide variation in appearance due to change in imaging configuration, or lighting conditions.Modeling and classifying landmarks in sequences of images is an ideal test case for image-based modeling and recognition techniques. It is critical in the development of capable robot systems which can operate in complex environments (e.g., urban environment.) Beyond recognition and classification tasks, the techniques can be used as pre-processing or cueing mechanisms for learning techniques by providing means of generating initial classification hypothesis.

Our results on image sequences in real environment show that visual learning techniques can be used for building image-based models suitable for recognition of landmarks in complex scenes. The approach performs well, even in the presence of significant photometric and geometric variations.

	Recognized	Rejected	Mis-Classified
Model Images	77.4%	22.6%	0%
Non-Model Images		99.7%	0.3%

On-Vehicle Experimentation: Matching performance in 12 sequences:

The two rows of the Table show statistics separately for the images that belong to the models, and those that do not, according to the manual annotation of the sequences. The "Recognized" column is the percentage of images that are matched to the correct model; the "Rejected" column is the percentage of images that are not matched with any model; finally, the "Mis-Classified" column is the percentage of the total number of images that are either matched to the wrong model (first row), or that are matched to a model when they should not be (second row) -- what would normally be called "false positives".

Papers and Results :

IUW 1997 : Visual Learning for Landmark Recognition : (Postscript | HTML)

Most Recent Result (Nov.1997):

Finding Images of Landmarks in Video Sequences : ( Postscript | HTML)

Recognition Sample Movie (Jan.1998) : N.Craig St (1MByte) and Fifth Ave (1.7MByte) in Pittsburgh

Slides (Jan.1998) : (Postscript | HTML)

Return to CMU MURI Home Page.