Next: Discussion Up: Grounding the Lexical Semantics Previous: An Efficient Inference Procedure

Experimental Results

The techniques described in this paper have been implemented as a system called LEONARD and tested on a number of video sequences. LEONARD successfully recognizes the events pick up, put down, stack, unstack, move, assemble, and disassemble using the definitions given in Figure 10. Figures 1 and 11 through 15 show the key frames from movies that depict these seven event types. These movies were filmed using a Canon VC-C3 camera and a Matrox Meteor frame grabber at 320 240 resolution at 30fps. Figures 4 and 16 through 20 show the results of segmentation, tracking, and model reconstruction for those key frames superimposed on the original images. Figures 5 and 21 through 25 show the results of event classification for these movies. These figures show LEONARD correctly recognizing the intended event classes for each movie.

Figure 16: The output of the segmentation-and-tracking and model-reconstruction components applied to the image sequence from Figure 11, an image sequence that depicts a stack event.

Figure 17: The output of the segmentation-and-tracking and model-reconstruction components applied to the image sequence from Figure 12, an image sequence that depicts an unstack event.

Figure 18: The output of the segmentation-and-tracking and model-reconstruction components applied to the image sequence from Figure 13, an image sequence that depicts a move event.

Figure 19: The output of the segmentation-and-tracking and model-reconstruction components applied to the image sequence from Figure 14, an image sequence that depicts an assemble event.

Figure 20: The output of the segmentation-and-tracking and model-reconstruction components applied to the image sequence from Figure 15, an image sequence that depicts a disassemble event.

Figure 21: The output of the event-classification component applied to the model sequence from Figure 16. Note that the stack event is correctly recognized, as well as the constituent put down event.

Figure 22: The output of the event-classification component applied to the model sequence from Figure 17. Note that the unstack event is correctly recognized, as well as the constituent pick up event.

Figure 23: The output of the event-classification component applied to the model sequence from Figure 18. Note that the move event is correctly recognized, as well as the constituent pick up and put down subevents.

Figure 24: The output of the event-classification component applied to the model sequence from Figure 19. Note that the assemble event is correctly recognized, as well as the constituent put down and stack subevents.

Figure 25: The output of the event-classification component applied to the model sequence from Figure 20. Note that the disassemble event is correctly recognized, as well as the constituent pick up and unstack subevents.

In Figure 4(a), Frames 0 through 1 correspond to the first subevent of a pick up event, Frames 2 through 13 correspond to the second subevent, and Frames 14 through 22 correspond to the third subevent. In Figure 4(b), Frames 0 through 13 correspond to the first subevent of a put down event, Frames 14 through 22 correspond to the second subevent, and Frames 23 through 32 correspond to the third subevent. LEONARD correctly recognizes these as instances of pick up and put down respectively. In Figure 16, Frames 0 through 11, 12 through 23, and 24 through 30 correspond to the three subevents of a put down event. LEONARD correctly recognizes this as a put down event and also as a stack event. In Figure 17, Frames 0 through 10, 11 through 24, and 25 through 33 correspond to the three subevents of a pick up event. LEONARD correctly recognizes this as a pick up event and also as an unstack event. In Figure 18, Frames 0 through 8, 9 through 16, and 17 through 45 correspond to the three subevents of a pick up event and Frames 17 through 33, 34 through 45, and 46 through 52 correspond to the three subevents of a put down event. LEONARD correctly recognizes the combination of these two events as a move event. In Figure 19, Frames 18 through 32, 33 through 40, and 41 through 46 correspond to the three subevents of a put down event and Frames 57 through 67 and 68 through 87 correspond to the first and third subevents of a second put down event, with the second subevent being empty. The second put down event is also correctly recognized as a stack event and the combination of these two events is correctly recognized as an assemble event. In Figure 20, Frames 0 through 18, 19 through 22, and 23 through 50 correspond to the three subevents of a pick up event and Frames 23 through 56, 57 through 62, and 63 through 87 correspond to the three subevents of a second pick up event. The first pick up event is also correctly recognized as an unstack event and the combination of these two events is correctly recognized as a disassemble event. These examples show that LEONARD correctly recognizes each of the seven event types with no false positives.

As discussed in the introduction, using force dynamics and event logic to recognize events offers several advantages over the prior approach of using motion profile and hidden Markov models.

robustness against variance in motion profile
robustness against presence of extraneous objects in the field of view
ability to perform temporal and spatial segmentation of events
ability to detect non-occurrence of events

Figures 26 through 35 illustrate these advantages. Figure 26 shows a pick up event from the left in contrast to Figure 4(a) which is from the right. Even though these have different motion profiles, Figure 31 shows that LEONARD correctly recognizes that these exhibit the same sequence of changes in force-dynamic relations and constitute the same event type, namely pick up. Figure 27 shows a pick up event with two extraneous blocks in the field of view. Figure 32 shows that LEONARD correctly recognizes that these extraneous blocks do not participate in any events and, despite their presence, the truth conditions for a pick up event still hold between the other objects. Figure 28 shows a pick up event, followed by a put down event, followed by another pick up event, followed by another put down event. Figure 33 shows that LEONARD correctly recognizes this sequence of four event occurrences. Figure 29 shows two simultaneous pick up events. Figure 34 shows that LEONARD correctly recognizes these two simultaneous event occurrences. Finally, Figure 30 shows two non-events. Figure 35 shows that LEONARD is not fooled into thinking that these constitute pick up or put down events, even though portions of these events have similar motion profile to pick up and put down events. LEONARD correctly recognizes that these movies do not match any known event types.

Figure 26: The output of the segmentation-and-tracking and model-reconstruction components on an image sequence depicting a pick up event from the left instead of from the right.

Figure 27: The output of the segmentation-and-tracking and model-reconstruction components on an image sequence depicting a pick up event with extraneous objects in the field of view.

Figure 28: The output of the segmentation-and-tracking and model-reconstruction components on an image sequence depicting a sequence of a pick up event, followed by a put down event, followed by another pick up event, followed by another put down event.

Figure 29: The output of the segmentation-and-tracking and model-reconstruction components on an image sequence depicting two simultaneous pick up events.

Figure 30: The output of the segmentation-and-tracking and model-reconstruction components applied to the image sequences from Figure 2, image sequences that depict non-events.

Figure 31: The output of the event-classification component applied to the model sequence from Figure 26. Note that the pick up event is correctly recognized despite the fact that it was performed from the left instead of from the right.

Figure 32: The output of the event-classification component applied to the model sequence from Figure 27. Note that the pick up event is correctly recognized despite the presence of extraneous objects in the field of view.

Figure 33: The output of the event-classification component applied to the model sequence from Figure 28. Note that LEONARD correctly recognizes a pick up event, followed by a put down event, followed by another pick up event, followed by another put down event.

Figure 34: The output of the event-classification component applied to the model sequence from Figure 29. Note that the two simultaneous pick up events are correctly recognized.

Figure 35: The output of the event-classification component applied to the model sequences from Figure 30. Note that LEONARD correctly recognizes that no events occurred in these sequences.

An approach to even classification is valid and useful only if it is robust. A preliminary evaluation of the robustness of LEONARD was conducted. Thirty five movies were filmed, five instances of each of the seven event types pick up, put down, stack, unstack, move, assemble, and disassemble. These movies resemble those in Figures 1 and 11 through 15. The same subject performed all thirty five events. These movies were processed by LEONARD. The results of this preliminary evaluation are summarized in Table 1. A more extensive evaluation of LEONARD will be conducted in the future.

table3256
Table 1: An evaluation of the robustness of LEONARD on a test set of five movies of each of seven event types. The rows represent movies of the indicated event types. The columns represent classifications of the indicated event type. The entries x/y indicate x, the number of times that a movie of the indicated event type was classified as the indicated event type, and y, the number of times that the movie should have been classified as the indicated event type. Note that stack entails put down, unstack entails pick up, move entails both a pick up and a put down, assemble entails both a put down and a separate stack, and disassemble entails both a pick up and a separate unstack. Thus off-diagonal entries are expected in these cases. There were six false negatives and no false positives. Four of the false negatives were for the event type assemble. In three of those cases, LEONARD successfully recognized the constituent put down subevent but failed to recognize the constituent stack subevent as well as the associated put down subevent. In one case, LEONARD failed to recognize both the constituent put down and stack subevents along with the associated put down constituent of the stack subevent. One of the false negatives was for the event type move. In this case, LEONARD successfully recognized the constituent put down subevent but failed to recognize the constituent pick up subevent. The remaining false negative was for the event type unstack. In this case, LEONARD successfully recognized the constituent pick up subevent but failed to recognize the aggregate unstack event.

Next: Discussion Up: Grounding the Lexical Semantics Previous: An Efficient Inference Procedure

Jeffrey Mark Siskind
Wed Aug 1 19:08:09 EDT 2001