Kornel Laskowski
Former Student (c/o R Stern)
Language Technologies Institute
School of Computer Science
Carnegie Mellon University
kornel AT cs DOT cmu DOT edu
Carnegie Mellon University
407 S Craig St, SCR 218
Pittsburgh PA, 15213
USA
Phone: +1 412 268 2518
Fax: +1 412 268 5578
KTH Speech, Music and Hearing
Lindstedstvägen 24
SE-100 44 Stockholm
Sweden
Phone: +46 8 790 97 51
Fax: +46 8 790 78 54
|
Kornel Laskowski
|
Extended Degree-of-Overlap (EDO) Model: A Normative Implementation in C
The multi-port EDO (MPEDO) model provides bigram transition probabilities of multi-participant vocal activity overlap.
It "ties" transitions in the actual multi-participant vocal activity space, between states which are specific to the
number and index assignment of participants, in an alternate space which is independent of both the number of and the
index assignment of participants. It is indended to provide likelihoods over multi-participant vocal activity
chronograms, to aid in vocal activity detection in multi-party settings, and to enable prediction of vocal activity
deployment in those same settings. The model was developed with
Tanja Schultz of the
Language Technologies Institute at
Carnegie Mellon University
(now at the Karlsruhe Institute of Technology) and
Mari Ostendorf of the
Department of Electrical Engineering at the
University of Washington
(when she was visiting the Karlsruhe Institute of Technology).
The more recent single-port EDO (SPEDO) model was developed with
Mattias Heldner
(now at the
Department of Linguistics at
Stockholm University) and
Jens Edlund
at the
Department of Speech, Music and Hearing at the
Royal Institute of Technology.
- edo-1.2.9.tar.gz (15 Aug 2011)
- implements a K- and T- independent perplexity over observed speech activity using the single-port model (SPEDO)
- replicates the figures in (Laskowski, Edlund & Heldner, ICASSP 2011) using
Makefile.ICASSP2011
- replaces Figure 1 in (Laskowski, Edlund & Heldner, ICASSP 2011) with a corrected version
- obsoletes edo-1.0.5.tar.gz (03 Nov 2010)
- edo-1.0.5.tar.gz (03 Nov 2010)
- includes sorting and random number generation code derived from Numerical Recipes in C (2nd ed., 30 Oct 1992)
- implements a K- and T- independent perplexity over observed speech activity using the multi-port model (MPEDO)
- replicates results from Errata 1 to (Laskowski, 2010) using lex.Q (available below) and the scripts
ACL2010_Table1.sh and
ACL2010_Table2.sh
- additionally implements several guessing baselines for comparison
- obsoletes edo-1.0.0.tar.gz (26 Oct 2010)
- dcm-1.0.0.tar.gz (03 Nov 2010)
- implements perplexity over observed speech activity using direct compositional models (cf. Laskowski, 2010)
- replicates results from Errata 1 to (Laskowski, 2010) using lex.Q and the scripts
ACL2010_Table1.m,
ACL2010_Table2_CD.m, and
ACL2010_Table2_UI.m (packaged in the distribution)
- the MATLAB implementation is suboptimal in speed, size, and clarity, and is provided for the purposes of comparison with edo-1.0.5.tar.gz
- lex.Q.tar.gz (26 Oct 2009)
- exemplar data (from the ICSI Meeting Corpus) for exercising the models on this page
- derived from the (essentially) continuous-time, forced-alignment-mediated, and participant-attributed speech activity references in the ICSI Meeting Recorder Dialog Act annotations (version icsi_mrda+hs_corpus_050512.tar.gz)
- sampled at a frame step of 100 ms, a frame size of 100 ms, and a within-frame threshold of 0.5
References:
The EDO model was first proposed in
but was not explicitly thus named until
Kornel Laskowski and Tanja Schultz (2007),
Modeling Vocal Interaction for Segmentation in Meeting Recognition.
In Machine Learning for Multimodal Interaction (A. Popescu-Belis, S. Renals, and H. Bourlard, eds.),
Springer Berlin/Heidelberg (Lecture Notes in Computer Science 4892), pp259-270.
Presented at the 4th Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms (MLMI2007), Brno, Czech Republic, 28-30 June.
[slides]
Its application to multi-pass ASR in meetings was described in
Most recently, it was applied to predicting the future within conversations, under the assumption of
conditional dependence in
and conditional independence in
|
|
Last modified: Tue 16 Aug 2011 0308hrs GMT
|
|