Video Summarization

Amongst other things, we are working to produce „re-countings“ of YouTube-style videos within reasonably narrow domains. We are mainly responsible for the CMU audio and NLG part of the effort. For re-counting, the goal is to produce a useful textual summary of a video, that explains what (a person, an object, an action) you see in a video, and why it may be important in a certain task. We would also experiment with text that explains differences between videos, etc.

You can find more information about the goals and evaluation(s) at http://www.nist.gov/itl/iad/mig/med11.cfm. Note that this is only the evaluation of event „classification“, not „re-counting“. We are early in the project, and are trying to extract meaningful information from the audio tracks of the Youtube videos.

Eventually, the goal is to produce kind of a „summary“ of a video clip, i.e. a classification of a video clip into one of several classes (i.e. „This video explains how to bake a cake“), and an additional explanation of why the video belongs to this class (i.e. „We can see ingredients, then hear about a recipe, and finally see a cake while people sing ,Happy Birthday‘“). The interesting factor is that here a machine has to explain why it is making a certain decision.

Below is a screen-shot of our current, early re-counting system, explaining while a certain video shows „how to repair an appliance“: automatic speech recognition picks up certain salient words (learned automatically), and we observe certain salient visual features. We are actively working on this interface, and will keep you posted on our progress here ...

Dienstag, 1. März 2011

Video Event Detection and Recounting

Weiter >

< Zurück