For questions about the annotation process please contact Ekaterina Taralova at etaralova AT cs.cmu.edu.
Subject ID | Annotation files | Starting frame* | Ending frame** |
---|---|---|---|
S06 | S06_Brownie.avi (half-resolution) Annotations (zip) | no offset, if using the half-resolution file | - |
S07 | Annotations (zip) | 508 | 10309 |
S08 | Annotations (zip) | 300 | 9000 |
S09 | Annotations (zip) | 226 | 13334 |
S10 | S10_Brownie.avi (half-resolution) Annotations (zip) | no offset, if using half-resolution file | - |
S12 | Annotations (zip) | 400 (updated 11/03) | 15233 |
S13 | Annotations (zip) | 290 | 20151 |
S14 | Annotations (zip) | 386 | 11705 |
S16 | Annotations (zip) | 168 | 12338 |
S17 | Annotations (zip) | 236 | 11518 |
S18 | Annotations (zip) | 316 | 12088 |
S19 | Annotations (zip) | 354 | 14970 |
S20 | Annotations (zip) | 212 | 10576 |
S22 | Annotations (zip) | 262 | 17315 |
S23 | S23_Brownie.avi (half-resolution) Annotations (zip) | no offset, if using half-resolution file | - |
S24 | Annotations (zip) | 360 | 12391 |
About the annotation process
These annotations were made by looking at the first-person videos (wearable camera). The annotators had a list of options for the labels, where each label consists of four optional fields: verb, object1, preposition, object2. The annotations we provide here are from one annotator, however we have more annotations from two other people coming up. A snapshot of the annotation tool can be found here (in collaboration with Moritz Tenorth, TUM). A new annotation tool for Mechanical Turk is being developed by Alex Sorokin, UIUC/CMU (in collaboration with our lab and Moritz Tenorth, TUM). More information will be available soon. |
About the data files
In each zip provided, the "labels.dat" file contains 3 columns - the first is the starting frame of the action, the second is the ending frame of the action, and the third is the action label in the following format: "verb-object1-preposition-object2". The file "unique_labels.dat" contains one column, where each row is a class ID corresponding to one of the actions among all annotated subjects, one per frame (the video was recorded at 30fps). |
About synchronization with sensors
The annotations start from the "starting frame" specified in the table below, which is the point in time when the subject turns on/off the light used for synchronization. Thus, the first row/frame in the annotation files corresponds to the value of the "starting frame." |
About the dataset
The first-person videos and other sensors can be downloaded from http://kitchen.cs.cmu.edu/ |
Subject ID | Annotation files | Starting frame* | Ending frame** |
---|---|---|---|
S06 | Annotations (zip) | 1192 | 12010 |
S07 | Annotations (zip) | 1936 | 11737 |
S08 | Annotations (zip) | 1232 | 9932 |
S09 | Annotations (zip) | 1877 | 14985 |
S10 | Annotations (zip) | 1001 | 14060 |
S12 | Annotations (zip) | 1707 | 16540 |
S13 | Annotations (zip) | 919 | 20780 |
S14 | Annotations (zip) | 1910 | 13229 |
S16 | Annotations (zip) | 1596 | 13766 |
S17 | Annotations (zip) | 1464 | 12746 |
S18 | Annotations (zip) | 1198 | 12970 |
S19 | Annotations (zip) | 1200 | 15816 |
S20 | Annotations (zip) | 445 | 10809 |
S22 | Annotations (zip) | 1180 | 18233 |
S23 | Annotations (zip) | 1186 | 13964 |
S24 | Annotations (zip) | 841 | 12872 |
* The "starting frame" is relative to the first frame of the first-person video, when the video is decomposed into single frames (30fps). This corresponds the to frame when the subject turns on and off the light switch which is used for synchronization (i.e., the initial setup and calibration frames which contain no actions are skipped). ** The "ending frame" is the last frame for which annotations are available. This corresponds to the last action that the subject performs (i.e., the frames where the subject walks back to the middle of the room are skipped, as they don't contain recipe-related actions). |
For more information, see (note: I am now publishing under Ekaterina H. Taralova):