CMU Multi-Modal Activity Dataset Annotations

For questions about the annotation process please contact Ekaterina Taralova at etaralova AT cs.cmu.edu.

UPDATE IN ANNOTATION FRAME OFFSETS 11/02/2012

Please note: there was a difference between the videos provided on the main website and the annotation offsets provided on this page. The videos currently available on the website have the following frame offsets (if you have used the older version of the videos, the offsets are still reported below). Please note that the annotations were performed using the wearable camera (the frame offsets are with respect to the wearable camera frames).

UPDATE MISSING VIDEOS 11/03/2012, 2pm EST

Some of the videos are missing from the main dataset webpage - I only have access to half-resolution videos, so I am posting a link to these in the following table, and I have contacted the main team to see if we can find the videos. Update will be posted here, or email me if you would like to be notified: etaralova @ cs.cmu.edu.

Subject ID Annotation files Starting frame* Ending frame**
S06 S06_Brownie.avi (half-resolution) Annotations (zip) no offset, if using the half-resolution file -

S07 Annotations (zip) 508 10309

S08 Annotations (zip) 300 9000

S09 Annotations (zip) 226 13334

S10 S10_Brownie.avi (half-resolution) Annotations (zip) no offset, if using half-resolution file -

S12 Annotations (zip) 400 (updated 11/03) 15233

S13 Annotations (zip) 290 20151

S14 Annotations (zip) 386 11705

S16 Annotations (zip) 168 12338

S17 Annotations (zip) 236 11518

S18 Annotations (zip) 316 12088

S19 Annotations (zip) 354 14970

S20 Annotations (zip) 212 10576

S22 Annotations (zip) 262 17315

S23 S23_Brownie.avi (half-resolution) Annotations (zip) no offset, if using half-resolution file -

S24 Annotations (zip) 360 12391

Subject ID	Annotation files	Starting frame*	Ending frame**
S06	S06_Brownie.avi (half-resolution) Annotations (zip)	no offset, if using the half-resolution file	-
S07	Annotations (zip)	508	10309
S08	Annotations (zip)	300	9000
S09	Annotations (zip)	226	13334
S10	S10_Brownie.avi (half-resolution) Annotations (zip)	no offset, if using half-resolution file	-
S12	Annotations (zip)	400 (updated 11/03)	15233
S13	Annotations (zip)	290	20151
S14	Annotations (zip)	386	11705
S16	Annotations (zip)	168	12338
S17	Annotations (zip)	236	11518
S18	Annotations (zip)	316	12088
S19	Annotations (zip)	354	14970
S20	Annotations (zip)	212	10576
S22	Annotations (zip)	262	17315
S23	S23_Brownie.avi (half-resolution) Annotations (zip)	no offset, if using half-resolution file	-
S24	Annotations (zip)	360	12391

About the annotation process
These annotations were made by looking at the first-person videos (wearable camera). The annotators had a list of options for the labels, where each label consists of four optional fields: verb, object1, preposition, object2. The annotations we provide here are from one annotator, however we have more annotations from two other people coming up.
A snapshot of the annotation tool can be found here (in collaboration with Moritz Tenorth, TUM). A new annotation tool for Mechanical Turk is being developed by Alex Sorokin, UIUC/CMU (in collaboration with our lab and Moritz Tenorth, TUM). More information will be available soon.
About the data files
In each zip provided, the "labels.dat" file contains 3 columns - the first is the starting frame of the action, the second is the ending frame of the action, and the third is the action label in the following format: "verb-object1-preposition-object2". The file "unique_labels.dat" contains one column, where each row is a class ID corresponding to one of the actions among all annotated subjects, one per frame (the video was recorded at 30fps).
About synchronization with sensors
The annotations start from the "starting frame" specified in the table below, which is the point in time when the subject turns on/off the light used for synchronization. Thus, the first row/frame in the annotation files corresponds to the value of the "starting frame."
About the dataset
The first-person videos and other sensors can be downloaded from http://kitchen.cs.cmu.edu/

Subject ID Annotation files Starting frame* Ending frame**
S06 Annotations (zip) 1192 12010

S07 Annotations (zip) 1936 11737

S08 Annotations (zip) 1232 9932

S09 Annotations (zip) 1877 14985

S10 Annotations (zip) 1001 14060

S12 Annotations (zip) 1707 16540

S13 Annotations (zip) 919 20780

S14 Annotations (zip) 1910 13229

S16 Annotations (zip) 1596 13766

S17 Annotations (zip) 1464 12746

S18 Annotations (zip) 1198 12970

S19 Annotations (zip) 1200 15816

S20 Annotations (zip) 445 10809

S22 Annotations (zip) 1180 18233

S23 Annotations (zip) 1186 13964

S24 Annotations (zip) 841 12872

Subject ID	Annotation files	Starting frame*	Ending frame**
S06	Annotations (zip)	1192	12010
S07	Annotations (zip)	1936	11737
S08	Annotations (zip)	1232	9932
S09	Annotations (zip)	1877	14985
S10	Annotations (zip)	1001	14060
S12	Annotations (zip)	1707	16540
S13	Annotations (zip)	919	20780
S14	Annotations (zip)	1910	13229
S16	Annotations (zip)	1596	13766
S17	Annotations (zip)	1464	12746
S18	Annotations (zip)	1198	12970
S19	Annotations (zip)	1200	15816
S20	Annotations (zip)	445	10809
S22	Annotations (zip)	1180	18233
S23	Annotations (zip)	1186	13964
S24	Annotations (zip)	841	12872

* The "starting frame" is relative to the first frame of the first-person video, when the video is decomposed into single frames (30fps). This corresponds the to frame when the subject turns on and off the light switch which is used for synchronization (i.e., the initial setup and calibration frames which contain no actions are skipped).

** The "ending frame" is the last frame for which annotations are available. This corresponds to the last action that the subject performs (i.e., the frames where the subject walks back to the middle of the room are skipped, as they don't contain recipe-related actions).

For more information, see (note: I am now publishing under Ekaterina H. Taralova):

Temporal Segmentation and Activity Classification from First-person Sensing

Ekaterina H. Spriggs, Fernando De la Torre Frade, and Martial Hebert,
IEEE Workshop on Egocentric Vision, CVPR 2009, June, 2009. Abstract. Download paper (PDF).