KNOWLEDGE TRANSFER FROM WEAKLY LABELED AUDIO USING CONVOLUTIONAL NEURAL NETWORK FOR SOUND EVENTS AND SCENES. pdf

Authors: Anurag Kumar, Maksim Khadkevich, Christian Fügen


Area Under ROC Curves (AUC)

average Precision

Figure compared AP of \(\mathcal{N}_S\) and \(\mathcal{N}_S^{slat}\). The Index of sound events 1 to 50 is shown below. The first column is Index in the above figure, second is the event id (magenta) as used in Audioset dataset and third (blue) is sound event name

1    /m/07s0s5r   Strum
2    /m/042v_gx   Acoustic guitar
3    /m/02hnl   Drum kit
4    /m/02qldy   Narration- monologue
5    /m/02mk9   Engine
6    /m/0l14md   Percussion
7    /m/07gxw   Techno
8    /m/05zppz   Male speech- man speaking
9    /m/07s72n   Dubstep
10    /m/068hy   Domestic animals- pets
11    /m/026t6   Drum
12    /m/015p6   Bird
13    /t/dd00126   Inside- large room or hall
14    /m/07y_7   Violin- fiddle
15    /t/dd00128   Outside- urban or manmade
16    /t/dd00129   Outside- rural or natural
17    /m/02lkt   Electronic music
18    /m/0jbk   Animal
19    /m/0k4j   Car
20    /m/015lz1   Singing
21    /m/0fx80y   Plucked string instrument
22    /m/0342h   Guitar
23    /t/dd00125   Inside- small room
24    /m/04szw   Musical instrument
25    /m/07yv9   Vehicle
26    /m/09x0r   Speech
27    /m/04rlf   Music