Searching and identifying various people appearing in broadcast news
video leads to better understanding and access of the news video
content. This talk addresses two reverse problems related to the
people appearing in news video: (1) person finding, which is about
locating the appearances of named persons in the video, and (2) person
naming, which attempts to label individual persons with their
names. The bottleneck of the first problem is the temporal mismatches
between people names in the transcript and their visual appearances in
the video, which is solved by introducing a timing pattern factor into
the text-based IR method. Combining visual features such as facial
similarity and anchor classifier also helps improve the
performance. The second problem is formulated as a classification
problem attacked by a machine learning approach, which exploits a
variety of multi-modal features including speaker identification,
transcript clues, temporal video structure, etc. High accuracy on
person naming has been reported on ABC World News Tonight video in
TRECVID 2004 dataset. Moreover, since our approach does not rely on
face recognition, it is able to name people that have never been seen
before.
|