KNOWLEDGE TRANSFER FROM WEAKLY LABELED AUDIO USING CONVOLUTIONAL NEURAL NETWORK FOR SOUND EVENTS AND SCENES. pdf

Authors: Anurag Kumar, Maksim Khadkevich, Christian Fügen


Localization of Sound Events

In the figures below we show some examples of localization of sound events. The figure shows how the output activation for the sound event of interest changes with time. Note, how the activation (red line ) becomes rises up when the event of interest is occurring. Backgrond is logmel spectrogram.
Number of Events vs Number of Examples
Fig 1. Sound event label, Whoosh- swoosh- swish. The red line shows the localization done by the network.


Number of Events vs Number of Examples
Fig 2. Sound event label, Sidetone. Sidetone is a very small duration event whereas segment size is 128 logmel frames (~1.5 s)

Number of Events vs Number of Examples
Fig 3. Sound event label, Machine gun and Gunfire.

Number of Events vs Number of Examples
Fig 4. Sound event label, Sneeze.

Number of Events vs Number of Examples
Fig 5. Sound event label, Horse, Neigh-whinny sound.

Number of Events vs Number of Examples
Fig 6. Sound event label, Bird.