This webpage contains additional information for our ICASSP 2017 paper, titled, KNOWLEDGE TRANSFER FROM WEAKLY LABELED AUDIO USING CONVOLUTIONAL NEURAL NETWORK FOR SOUND EVENTS AND SCENES .
If you did not come on this page from the paper, might be a good idea to read the paper first, pdf
We provide addtional results and analysis here.
This paper sets state of art results on Audioset and ESC-50 datasets. On ESC-50 dataset it achieves better than human accuracy. On Audioset it sets state of art using balanced training set.
Authors: Anurag Kumar, Maksim Khadkevich, Christian Fügen
Email: alnu AT andrew DOT cmu DOT edu, fugen AT fb DOT com
Here is the code to extract features.
Audioset is a large scale weakly labeled [2] dataset for sound events, Audioset [1]. It contains a total of 527 sound events for which labeled videos from Youtube are provided.
Click Here For More Details and Results on Audioset
Here, we show the results on sound event classification using the proposed approaches to learn represnetation using \(\mathcal{N}_S\). ESC-50 dataset consists of a total of 50 different sound events.
Click Here For More Details and Results on ESC-50
Click Here For More Details and Results on DCASE-2016
Click Here For More Details on Semantic Understanding using our methods
[1]Jort F Gemmeke, Daniel PW Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R Channing Moore, Manoj Plakal, and Marvin Ritter, “Audio set: An ontology and human-labeled dataset for audio events,” in IEEE ICASSP, 2017.
[2] Anurag Kumar, Bhiksha Raj , “Audio Event Detection using Weakly Labeled Data,” ACM Multimedia (MM), 2016
[3] Anurag Kumar, Bhiksha Raj , “Weakly Supervised Scalable Audio Content Analysis,” IEEE ICME, 2016
[4] Karol J Piczak, “Esc: Dataset for environmental sound classification,” in Proceedings of the 23rd ACM Multimedia. ACM, 2015.
[5] Annamaria Mesaros, Toni Heittola, and Tuomas Virtanen, “Tut database for acoustic scene classification and sound event detection,” ESIPCO, 2016.