KNOWLEDGE TRANSFER FROM WEAKLY LABELED AUDIO USING CONVOLUTIONAL NEURAL NETWORK FOR SOUND EVENTS AND SCENES. pdf

Authors: Anurag Kumar, Maksim Khadkevich, Christian Fügen


Area Under ROC Curves (AUC)

average Precision

Figure compared AP of \(\mathcal{N}_S\) and \(\mathcal{N}_S^{slat}\). The Index of sound events 1 to 50 is shown below. The first column is Index in the above figure, second is the event id (magenta) as used in Audioset dataset and third (blue) is sound event name

1    /m/096m7z   Noise
2    /m/0dl9sf8   Throat clearing
3    /m/07m2kt   Chorus effect
4    /m/07pws3f   Bang
5    /m/07hvw1   Field recording
6    /m/024dl   Cash register
7    /m/07pdhp0   Biting
8    /m/07rn7sz   Shatter
9    /m/01s0vc   Zipper
10    /m/07qcx4z   Tearing
11    /m/015jpf   Dial tone
12    /m/07qlwh6   Squish
13    /m/07s12q4   Crunch
14    /m/07st88b   Croak
15    /m/02rhddq   Reversing beeps
16    /m/03wvsk   Hair dryer
17    /m/0242l   Coin
18    /m/07prgkl   Pour
19    /m/05r5wn   Rattle
20    /m/07qb_dv   Scratch
21    /m/025wky1   Air conditioning
22    /m/04rmv   Mouse
23    /t/dd00088   Gush
24    /m/07qv4k0   Scrape
25    /m/02fxyj   Humming
26    /m/0dv5r   Camera
27    /t/dd00065   Light engine
28    /m/07pczhz   Chop
29    /m/07rrh0c   Thunk
30    /m/0k65p   Hands
31    /m/06xkwv   Mains hum
32    /m/01jg1z   Heart murmur
33    /m/04gy_2   Battle cry
34    /m/0dv3j   Boiling
35    /m/07rbp7_   Whip
36    /m/07pjwq1   Buzz
37    /m/01v_m0   Sine wave
38    /m/02c8p   Telephone dialing
39    /m/07pc8lb   Breaking
40    /m/07pyf11   Flap
41    /m/07r67yg   Ding-dong
42    /t/dd00121   Boing
43    /m/07bjf   Single-lens reflex camera
44    /m/0mbct   Gong
45    /m/01swy6   Yodeling
46    /m/07r81j2   Caterwaul
47    /m/07phhsh   Rumble
48    /t/dd00048   Train wheels squealing
49    /m/03l9g   Hammer
50    /m/03v3yw   Keys jangling