Evaluation of Image Segmentation Algorithms

The first segmentation is clearly better than the second one. Is there a principled way to compute a score that would capture that fact.

Unsupervised image segmentation is an important component in many image understanding algorithms and practical vision systems. However, evaluation of segmentation algorithms thus far has been largely subjective, leaving a system designer to judge the effectiveness of a technique based only on intuition and results in the form of a few example segmented images. This is largely due to image segmentation being an ill-defined problem—there is no unique ground-truth segmentation of an image against which the output of an algorithm may be compared.

In this work, we propose a new measure of similarity, the Normalized Probabilistic Rand (NPR) index, which can be used to perform a quantitative comparison between image segmentation algorithms using a hand-labeled set of ground-truth segmentations. This measure is a generalization of the Rand index commonly used in statistics and it has some nice properties, broadly speaking in four categories:

Nondegeneracy: The measure does not have degenerate cases where input instances that are not well represented by the ground-truth segmentations give abnormally high values of similarity.
No assumptions about data generation: The measure does not assume equal number of regions or region sizes in the segmentations.
Adaptive accommodation of refinement: The measure of similarity ignores small differences in pixel-level granularity in regions that humans find ambiguous and penalizes differences in refinement elsewhere.
Comparable scores: The measure gives scores that permit meaningful comparison between segmentations of different images and between different segmentations of the same image.

In practice, the measure of similarity allows principled comparisons between segmentations created by different algorithms, as well as segmentations on different images. The reason why it is well-suited for comparing segmentations is that it maches the "natural" ordering (i.e., as defined by human subjects) of segmentations. For example, the NPR index is low for both over- and under-segmentation, and is high on "good" segmentations that match well the reference segmentations:

	Input	Segmentation (mean shift)	Manual segmentations (from the Berkeley set)
Under-segmented NPR = -0.59
Over-segmented NPR = -0.74
"Good" segmentation NPR = 0.85

This can be seen in a different way by comparing the scores obtained by the NPR index with those obtained by three leading approaches for segmentation scoring. In general, our measure does conform to the natural ordering of the segmentations, including a sharper separation between useuful segementations and massively over- or under-segmented examplars:

Input image and 7 possible segmentations

NPR score value for
all 7 segmentations,
compared with the other popular similarity measures

We developed a procedure for algorithm evaluation through an example evaluation of some familiar algorithms—the mean-shift-based algorithm, an efficient graph-based segmentation algorithm, a hybrid algorithm that combines the strengths of both methods, and expectation maximization. We used the 300 images in the publicly available Berkeley Segmentation Data Set to validate this measure.

References

R. Unnikrishnan, C. Pantofaru, and M. Hebert. Toward Objective Evaluation of Image Segmentation Algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 29, No. 6, June, 2007, pp. 929-944.

R. Unnikrishnan, C. Pantofaru, and M. Hebert. Toward Objective Evaluation of Image Segmentation Algorithms, Tech. Report CMU-RI-TR-05-40, Robotics Institute, Carnegie Mellon University, September, 2005.

R. Unnikrishnan, C. Pantofaru, and M. Hebert. A Measure for Objective Evaluation of Image Segmentation Algorithms, Workshop on Empirical Evaluation Methods in Computer Vision, IEEE Conference on Computer Vision and Pattern Recognition (CVPR '05), June, 2005.

R. Unnikrishnan and M. Hebert. Measures of Similarity. Seventh IEEE Workshop on Applications of Computer Vision, January, 2005, pp. 394-400.

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright.

Evaluation of Image Segmentation Algorithms

People

Description

References

Copyright notice