Kornel Laskowski

Former Student (c/o R Stern)
Language Technologies Institute
School of Computer Science
Carnegie Mellon University

kornel AT cs DOT cmu DOT edu
Carnegie Mellon University
407 S Craig St, SCR 218
Pittsburgh PA, 15213
USA
Phone: +1 412 268 2518
Fax: +1 412 268 5578

KTH Speech, Music and Hearing
Lindstedstvägen 24
SE-100 44 Stockholm
Sweden
Phone: +46 8 790 97 51
Fax: +46 8 790 78 54

Kornel Laskowski

Fundamental Frequency Variation (FFV): A Normative Implementation in C

The FFV representation is an instantaneous-frame representation of variation in fundamental frequency, and is intended to indirectly model (at a sub-unit level) intonation trajectories, in the same way that standard MFCC features indirectly model formant trajectories at the sub-unit level. The representation was developed with Jens Edlund and Mattias Heldner at the Department of Speech, Music and Hearing at KTH.

ffv-1.1.0.tar.gz (19 Aug 2009)
- does not include any code derived from Numerical Recipes in C
- depends on FFT code from the Fastest Fourier Transform in the West (version 3.2.2)
ffv-1.0.1.tar.gz (10 Aug 2009)
- includes FFT code derived from Numerical Recipes in C (2nd ed., 30 Oct 1992)
- includes improvements by Timo Baumann

Code which wraps ffv-1.x.x for use in existing signal processing environments includes:

ffvext.tar.bz2 (11 Aug 2009)
- implements snack::audio ffv for The Snack Sound Toolkit (Tcl/Tk)
- contributed by Timo Baumann
featureFFVItf.c (10 Aug 2009)
- partially implements $fes intonation SVector FMatrix for interACT's Janus Recognition ToolKit (Tcl/Tk)
- provided as an illustrative example

References:

The FFV representation was introduced in

Kornel Laskowski, Jens Edlund, and Mattias Heldner (2008), An Instantaneous Vector Representation of Delta Pitch for Speaker-Change Prediction in Conversational Dialogue Systems. In proceedings of the 33rd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2008), Las Vegas NV, USA, 30 March - 04 April, pp5041-5044. [poster]

and an overview is available in

Kornel Laskowski, Mattias Heldner, and Jens Edlund (2008), The Fundamental Frequency Variation Spectrum. In proceedings of the 21st Swedish Phonetics Conference (Fonetik 2008), Gothenburg, Sweden, 11-13 June, pp29-32. [slides]

Several computational refinements are described in

Kornel Laskowski, Matthias Wölfel, Mattias Heldner, and Jens Edlund (2008), Computing the Fundamental Frequency Variation Spectrum in Conversational Spoken Dialogue Systems. In proceedings of the 155th Meeting of the Acoustical Society of America, 5th EAA Forum Acusticum, and 9th SFA Congrés Français d'Acoustique (Acoustics2008), Paris, France, 29 June - 04 July, pp3305-3310. [slides]
Kornel Laskowski, Mattias Heldner and Jens Edlund (2009), A General-Purpose 32 ms Prosodic Vector for Hidden Markov Modeling, In proceedings of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH2009), Brighton, UK, 6-10 September, pp724-727. [slides]

Demonstrations of inferred model structure over the representation are available in

Kornel Laskowski, Jens Edlund, and Mattias Heldner (2008), Learning Prosodic Sequences Using the Fundamental Frequency Variation Spectrum. In proceedings of the 4th ISCA International Conference on Speech Prosody (SP2008), Campinas, Brazil, 06-09 May. [poster]
Mattias Heldner, Jens Edlund, Kornel Laskowski, and Antoine Pelcé (2008), Prosodic Features in the Vicinity of Silences and Overlaps. To appear in proceedings of the 10th Nordic Conference on Prosody, Helsinki, Finland, 04-06 August.
Kornel Laskowski, Mattias Heldner and Jens Edlund (2009), Exploring the Prosody of Floor Mechanisms in English Using the Fundamental Frequency Variation Spectrum, In proceedings of the 17th European Signal Processing Conference (EUSIPCO2009), Glasgow, UK, 24-28 August, pp2539-2543. [poster]

Finally, application of the FFV representation to speaker recognition is described in

Kornel Laskowski and Qin Jin (2009), Modeling Instantaneous Intonation for Speaker Identification Using the Fundamental Frequency Variation Spectrum. In proceedings of the 34th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2009), Taipei, Taiwan, 19-24 April, pp4541-4544. [poster]

Last modified: Sun 20 Feb 2011 2315hrs GMT