Back to Bibliography by Subject
This page is divided into several categories:
General Overviews
These are all overviews of my work in this area. The IAKTA/LIST paper is most up-to-date, although it is actually quite dated. Since then, I've worked on Query-by-Humming, polyphonic score alignment, music search by polyphonic alignment, music structure analysis, and beat tracking informed by music structure.
Dannenberg, “Music Understanding,” 1987/1988 Computer Science Research Review, Carnegie Mellon School of Computer Science, pp. 19-28.
[Postscript Version] [Adobe Acrobat (PDF) Version]
Dannenberg, “Recent Work In Real-Time Music Understanding By Computer,” Music, Language, Speech, and Brain, Wenner-Gren International Symposium Series, Sundberg, Nord, and Carlson, ed., Macmillan, 1991, pp. 194-202.
Dannenberg, “Computerbegleitung und Musicverstehen,” in Neue Musiktechnologie, Bernd Enders, ed., Schott, 1993, Mainz, pp. 241-252.
Dannenberg, “Recent Work in Music Understanding,” in Proceedings of the 11th Annual Symposium on Small Computers in the Arts, Philadelphia, PA November 15-17, 1991. Philadelphia: SCAN, November 1991, pp. 9-14.
ABSTRACT: Interaction with computers in musical performances is very much limited by a lack of music understanding by computers. If computers do not understand musical structures such as rhythmic units, chords, keys, and phrases, then interaction with computers will necessarily be difficult and cumbersome. Research into Music Understanding by computer aims to raise the level of human computer interaction in musical tasks including live music performance.
[Postscript Version] [Adobe Acrobat (PDF) Version]
Dannenberg, “Music Understanding and the Future of Computer Music,” Contemporary Music Review, (to appear).
Dannenberg, “Music Understanding by Computer,” in IAKTA/LIST International Workshop on Knowledge Technology in the Arts Proceedings, International Association of Knowledge Technology in the Arts, Inc. in cooperation with Laboratories of Image Information Science and Technology, Osaka Japan, pp. 41-56 (September 16, 1993).
ABSTRACT. Music Understanding refers to the recognition or identification of structure and pattern in musical information. Music understanding projects initiated by the author are discussed. In the first, Computer Accompaniment, the goal is to follow a performer in a score. Knowledge of the position in the score as a function of time can be used to synchronize an accompaniment to the live performer and automatically adjust to tempo variations. In the second project, it is shown that statistical methods can be used to recognize the location of an improviser in a cyclic chord progression such as the 12-bar blues. The third project, Beat Tracking, attempts to identify musical beats using note-onset times from a live performance. Parallel search techniques are used to consider several hypotheses simultaneously, and both timing and higher-level musical knowledge are integrated to evaluate the hypotheses. The fourth project, the Piano Tutor, identifies student performance errors and offers advice. The fifth project studies human tempo tracking with the goal of improving the naturalness of automated accompaniment systems.
[Postscript Version] [Adobe Acrobat (PDF) Versioon.]
ABSTRACT: Artificial Intelligence and Machine Learning are enabling many advances in the area of music. Three computer music problems are described in which Machine Learning promises to solve problems and advance the state of the art. These are: computer accompaniment, music understanding in interactive compositions, and music synthesis. Machine Learning plays an important role in dealing with poorly defined problems where data is subject to noise and other variation, and where complexity rules out direct, handcrafted solutions. These characteristics are typical in sophisticated computer music systems. Machine learning promises to enable more natural communication between machines and musicians.
Dannenberg, Thom, and Watson, “A Machine Learning Approach to Musical Style Recognition" in 1997 International Computer Music Conference, International Computer Music Association (September 1997), pp. 344-347.
ABSTRACT: Much of the work on perception and understanding of music by computers has focused on low-level perceptual features such as pitch and tempo. Our work demonstrates that machine learning can be used to build effective style classifiers for interactive performance systems. We also present an analysis explaining why these techniques work so well when hand-coded approaches have consistently failed. We also describe a reliable real-time performance style classifier.
Han, Rho, Dannenberg, and Hwang, “SMERS: Music Emotion Recognition Using Support Vector Regression
” in Proceedings of the 10th International Conference on Music Information Retrieval (ISMIR 2009), (October 2009), pp. 651-656.ABSTRACT: Music emotion plays an important role in music retrieval, mood detection and other music-related applications. Many issues for music emotion recognition have been addressed by different disciplines such as physiology, psychology, cognitive science and musicology. We present a support vector regression (SVR) based music emotion recognition system. The recognition process consists of three steps: (i) seven distinct features are extracted from music; (ii) those features are mapped into eleven emotion categories on Thayer's two-dimensional emotion model; (iii) two regression functions are trained using SVR and then arousal and valence values are predicted. We have tested our SVR-based emotion classifier in both Cartesian and polar coordinate systems empirically. The results indicate the SVR classifier in the polar representation produces satisfactory results which reach 94.55% accuracy, superior to the SVR (in Cartesian) and other machine learning classification algorithms such as SVM and GMM.
ABSTRACT: Because music is not objectively descriptive or representational, the subjective qualities of music seem to be most important. Style is one of the most salient qualities of music, and in fact most descriptions of music refer to some aspect of musical style. Style in music can refer to historical periods, composers, performers, sonic texture, emotion, and genre. In recent years, many aspects of music style have been studied from the standpoint of automation: How can musical style be recognized and synthesized? An introduction to musical style describes ways in which style is characterized by composers and music theorists. Examples are then given where musical style is the focal point for computer models of music analysis and music generation.
ABSTRACT: In this paper we explore a technique for content-based music retrieval using a continuous pitch contour derived from a recording of the audio query instead of a quantization of the query into discrete notes. Our system determines the pitch for each unit of time in the query and then uses a time-warping algorithm to match this string of pitches against songs in a database of MIDI files. This technique, while much slower at matching, is usually far more accurate than techniques based on discrete notes. It would be an ideal technique to use to provide the final ranking of candidate results produced by a faster but lest robust matching algorithm.
Dannenberg, “Music Information Retrieval as Music Understanding,” in ISMIR 2001 2nd Annual International Symposium on Music Information Retrieval, Bloomington: Indiana University, (2001), pp. 139-142.
ABSTRACT: Much of the difficulty in Music Information Retrieval can be traced to problems of good music representations, understanding music structure, and adequate models of music perception. In short, the central problem of Music Information Retrieval is Music Understanding, a topic that also forms the basis for much of the work in the fields of Computer Music and Music Perception. It is important for all of these fields to communicate and share results. With this goal in mind, the author's work on Music Understanding in interactive systems, including computer accompaniment and style recognition, is discussed.
ABSTRACT: Query-by-humming systems search a database of music for good matches to a sung, hummed, or whistled melody. Errors in transcription and variations in pitch and tempo can cause substantial mismatch between queries and targets. Thus, algorithms for measuring melodic similarity in query-by-humming systems should be robust. We compare several variations of search algorithms in an effort to improve search precision. In particular, we describe a new frame-based algorithm that significantly outperforms note-by-note algorithms in tests using sung queries and a database of MIDI-encoded music.
ABSTRACT: Melodic similarity is an important concept for music databases, musicological studies, and interactive music systems. Dynamic programming is commonly used to compare melodies, often with a distance function based on pitch differences measured in semitones. This approach computes an "edit distance" as a measure of melodic dissimilarity. The problem can also be viewed in probabilistic terms: What is the probability that a melody is a "mutation" of another melody, given a table of mutation probabilities? We explain this approach and demonstrate how it can be used to search a database of melodies. Our experiments show that the probabilistic model performs better than a typical "edit distance" comparison.
Dannenberg, Birmingham, Tzanetakis, Meek, Hu, and Pardo, “The MUSART Testbed for Query-By-Humming Evaluation,” in Proceedings of the Fourth International Conference on Music Information Retrieval, Baltimore, Maryland, USA, October 2003. Baltimore: Johns Hopkins Univeristy, 2003. pp. 41-50.
An slightly expanded and revised version of this paper (not online) is published in Computer Music Journal.
ABSTRACT: Evaluating music information retrieval systems is acknowledged to be a difficult problem. We have created a database and a software testbed for the systematic evaluation of various query-by-humming (QBH) search systems. As might be expected, different queries and different databases lead to wide variations in observed search precision. “Natural” queries from two sources led to lower performance than that typically reported in the QBH literature. These results point out the importance of careful measurement and objective comparisons to study retrieval algorithms. This study compares search algorithms based on note-interval matching with dynamic programming, fixed-frame melodic contour matching with dynamic time warping, and a hidden Markov model. An examination of scaling trends is encouraging: precision falls off very slowly as the database size increases. This trend is simple to compute and could be useful to predict performance on larger databases.
Dannenberg, Birmingham, Tzanetakis, Meek, Hu, and Pardo, “The MUSART Testbed for
Query-By-Humming Evaluation,” Computer Music Journal, 28(2) (Summer 2004), pp. 34-48.
Birmingham, Dannenberg, and Pardo, “Query by Humming With the VocalSearch System,” Communications of the ACM, 49(8) (August 2006), pp. 49-52.
Dannenberg, “Listening to `Naima': An Automated Structural Analysis of Music from
Recorded Audio,” In Proceedings of the 2002 International Computer Music Conference.
San Francisco: International Computer Music Association., (2002), pp. 28-34.
ABSTRACT: Evaluating music information retrieval systems is acknowledged to be a difficult problem. We have created a database and a software testbed for the systematic evaluation of various query-by-humming (QBH) search systems. As might be expected, different queries and different databases lead to wide variations in observed search precision. “Natural” queries from two sources led to lower performance than that typically reported in the QBH literature. These results point out the importance of careful measurement and objective comparisons to study retrieval algorithms. This study compares search algorithms based on note-interval matching with dynamic programming, fixed-frame melodic contour matching with dynamic time warping, and a hidden Markov model. An examination of scaling trends is encouraging: precision falls off very slowly as the database size increases. This trend is simple to compute and could be useful to predict performance on larger databases.
ABSTRACT: Don't know the composer, performer, or title? Let the system match the theme you know to the song you want. When one wishes to find a piece of music through Apple Computer's iTunes or at the local public library, the usual approach is to enter some textual information (metadata) about the piece (such as composer, performer, or title) into a search engine. However, when one knows the music, but not its metadata, standard search engines are not an option. One might instead hum or whistle a portion of the piece, providing a query for a search engine based on content (the melody) rather than on metadata. Systems able to find a song based on a sung, hummed, or whistled melody are called query by humming, or QBH, even though humming is not always the input.
Dannenberg, Birmingham, Pardo, Hu, Meek, Tzanetakis, “A Comparative Evaluation of Search Techniques for Query-by-Humming Using the MUSART Testbed,” Journal of the American Society for Information Science and Technology, 58(5) (March 2007), pp. 687-701.
ABSTRACT: Query-by-Humming systems offer content-based searching for melodies and require no special musical training or knowledge. Many such systems have been built, but there has not been much useful evaluation and comparison in the literature due to the lack of shared databases and queries. The MUSART project testbed allows various search algorithms to be compared using a shared framework that automatically runs experiments and summarizes results. Using this testbed, we compared algorithms based on string alignment, melodic contour matching, a hidden Markov model, n-grams, and CubyHum. Retrieval performance is very sensitive to distance functions and the representation of pitch and rhythm, which raises questions about some previously published conclusions. Some algorithms are particularly sensitive to the quality of queries. Our queries, which are taken from human subjects in a fairly realistic setting, are quite difficult, especially for n-gram models. Finally, simulations on query-byhumming performance as a function of database size indicate that retrieval performance falls only slowly as the database size increases.
Izmirli and Dannenberg, “Understanding Features and Distance Functions for Music Sequence Alignment,” in Proceedings of the 11th International Society for Music Information Retrieval Conference, Utrecht, Germany, August 2010, pp. 411-416.
ABSTRACT: We investigate the problem of matching symbolic representations directly to audio based representations for applications that use data from both domains. One such application is score alignment, which aligns a sequence of frames based on features such as chroma vectors and distance functions such as Euclidean distance. Good representations are critical, yet current systems use ad hoc constructions such as the chromagram that have been shown to work quite well. We investigate ways to learn chromagram-like representations that optimize the classification of “matching” vs. “non-matching” frame pairs of audio and MIDI. New representations learned automatically from examples not only perform better than the chromagram representation but they also reveal interesting projection structures that differ distinctly from the traditional chromagram.
Huang, Ma, Xia, Dannenberg, and Faloutsos, “MidiFind: Fast and Effective Similarity Searching in Large MIDI Databases,” Proceedings of the 10th International Symposium on Computer Music Multidisciplinary Research, October 2013, Marseille, France, pp. 208-224.
ABSTRACT: While there are perhaps millions of MIDI files available over the Internet, it is difficult to find performances of a particular piece because well labeled metadata and indexes are unavailable. We address the particular problem of finding performances of compositions for piano, which is different from often-studied problems of Query-by-Humming and Music Fingerprinting. Our MidiFind system is designed to search a million MIDI files with high precision and recall. By using a hybrid search strategy, it runs more than 1000 times faster than naive competitors, and by using a combination of bag-of-words and enhanced Levenshtein distance methods for similarity, our system achieves a precision of 99.5% and recall of 89.8%.
Xia, Liang, Dannenberg, and Harvilla “Segmentation, Clustering, and Display in a Personal Audio Database for Musicians,” Proceedings of The 12th International Society for Music Information Retrieval Conference, October 2011, Miami, Florida, pp. 139-144.
ABSTRACT: Managing music audio databases for practicing musicians presents new and interesting challenges. We describe a systematic investigation to provide useful capabilities to musicians both in rehearsal and when practicing alone. Our goal is to allow musicians to automatically record, organize, and retrieve rehearsal (and other) audio to facilitate review and practice (for example, playing along with difficult passages). We introduce a novel music classification system based on Eigenmusic and Adaboost to separate rehearsal recordings into segments, an unsupervised clustering and alignment process to organize segments, and a digital music display interface that provides both graphical input and output in terms of conventional music notation.
Jiang and Danennberg, “Melody Identification in Standard MIDI Files,” in
Proceedings of the 16th Sound & Music Computing Conference (SMC2019),
Malaga, 2019, pp. 65-71.
ABSTRACT: Melody identification is an important early step in music analysis.
This paper presents a tool to identify the melody in each measure of a Standard MIDI
File. We also share an open dataset of manually labeled music for researchers. We use
a Bayesian maximum-likelihood approach and dynamic programming as the basis of our
work. We have trained parameters on data sampled from the million song dataset
and tested on a dataset including 1703 measures of music from different genres.
Our algorithm achieves an overall accuracy of 89% in the test dataset. We compare
our results to previous work.
Structural Analysis
Using similarity and repetition to guide them, listeners can discover structure in music. This research aims to build music listening models that, starting with audio such as CD recordings, find patterns and generate explanations of the music. Explanations include analyses of structure, e.g. and "AABA" form, as well as other relationships.
ABSTRACT: A model of music listening has been automated. A program takes digital
audio as input, for example from a compact disc, and outputs an explanation of the music
in terms of repeated sections and the implied structure. For example, when the program
constructs an analysis of John Coltrane's "Naima," it generates a description that relates
to the AABA form and notices that the initial AA is omitted the second time. The
algorithms are presented and results with two other input songs are also described.
This work suggests that music listening is based on the detection of relationships
and that relatively simple analyses can successfully recover interesting musical
structure.
ABSTRACT: Music is often described in terms of the structure of repeated phrases. For example, many songs have the form AABA, where each letter represents an instance of a phrase. This research aims to construct descriptions or explanations of music in this form, using only audio recordings as input. A system of programs is described that transcribes the melody of a recording, identifies similar segments, clusters these segments to form patterns, and then constructs an explanation of the music in terms of these patterns. Additional work using spectral information rather than melodic transcription is also described. Examples of successful machine “listening” and music analysis are presented.
ABSTRACT: Human listeners are able to recognize structure in music through the perception of repetition and other relationships within a piece of music. This work aims to automate the task of music analysis. Music is “explained” in terms of embedded relationships, especially repetition of segments or phrases. The steps in this process are the transcription of audio into a representation with a similarity or distance metric, the search for similar segments, forming clusters of similar segments, and explaining music in terms of these clusters. Several transcription methods are considered: monophonic pitch estimation, chroma (spectral) representation, and polyphonic transcription followed by harmonic analysis. Also, several algorithms that search for similar segments are described. These techniques can be used to perform an analysis of musical structure, as illustrated by examples.
Dannenberg and Hu, “Pattern Discovery Techniques for Music Audio,” Journal of New Music Research, (June 2003), pp. 153-164.
Our ISMIR 2002 paper (listed above) was selected from the conference papers for publication in JNMR. The JNMR version is slightly expanded and revised.
ABSTRACT: Human listeners are able to recognize structure in music through the perception of repetition and other relationships within a piece of music. This work aims to automate the task of music analysis. Music is “explained” in terms of embedded relationships, especially repetition of segments or phrases. The steps in this process are the transcription of audio into a representation with a similarity or distance metric, the search for similar segments, forming clusters of similar segments, and explaining music in terms of these clusters. Several pre-existing signal analysis methods have been used: monophonic pitch estimation, chroma (spectral) representation, and polyphonic transcription followed by harmonic analysis. Also, several algorithms that search for similar segments are described. Experience with these various approaches suggests that there are many ways to recover structure from music audio. Examples are offered using classical, jazz, and rock music.
This book chapter attempts to summarize various techniques and approaches.
ABSTRACT: Music is full of structure, including sections, sequences of distinct musical textures, and the repetition of phrases or entire sections. The analysis of music audio relies upon feature vectors that convey information about music texture or pitch content. Texture generally refers to the average spectral shape and statistical fluctuation, often reflecting the set of sounding instruments, e.g. strings, vocal, or drums. Pitch content reflects melody and harmony, which is often independent of texture. Structure is found in several ways. Segment boundaries can be detected by observing marked changes in locally averaged texture. Similar sections of music can be detected by clustering segments with similar average textures. The repetition of a sequence of music often marks a logical segment. Repeated phrases and hierarchical structures can be discovered by finding similar sequences of feature vectors within a piece of music. Structure analysis can be used to construct music summaries and to assist music browsing.
Hu, Dannenberg, and Tzanetakis. “Polyphonic Audio Matching and Alignment for Music Retrieval,” in 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New York: IEEE (2003), pp. 185-188.
ABSTRACT: We describe a method that aligns polyphonic audio recordings of music to symbolic score information in standard MIDI files without the difficult process of polyphonic transcription. By using this method, we can search through a MIDI database to find the MIDI file corresponding to a polyphonic audio recording.We must have run out of space for a longer abstract. This paper covers two interesting experiments. One compares different features for alignment and concludes that the chromagram is better than multiple pitch estimation, spectra, and mel cepstra. The paper also includes an experiment where the quality of match is used to search for midi files that match audio. It works, but not very reliably.
Dannenberg and Hu. “Polyphonic Audio Matching for Score Following and Intelligent Audio Editors,” in Proceedings of the 2003 International Computer Music Conference, San Francisco: International Computer Music Association, (2003), pp. 27-34.
This paper was actually submitted before the WASPAA paper, so it does not have some results on comparing different distance metrics. Instead, this paper stresses some different applications, one being the possibility of intelligent audio editors that align audio to symbolic notation or midi files to help with search, indexing, aligning multiple takes of live recordings, etc.ABSTRACT: Getting computers to understand and process audio recordings in terms of their musical content is a difficult challenge. We describe a method in which general, polyphonic audio recordings of music can be aligned to symbolic score information in standard MIDI files. Because of the difficulties of polyphonic transcription, we convert MIDI to audio and perform matching directly on acoustic features. Polyphonic audio matching can be used for polyphonic score following, building intelligent editors that understand the content of recorded audio, and the analysis of expressive performance.
Dannenberg and Raphael, “Music Score Alignment and Computer Accompaniment,” Communications of the ACM, 49(8) (August 2006), pp. 38-43.
Liu, Dannenberg, and Cai, “The Intelligent Music Editor: Towards an Automated Platform for Music Analysis and Editing,” in Proceedings of the Seventh International Conference on Intelligent Computing, Cairo, Egypt, December 2010, pp. 123-131.
ABSTRACT: Digital music editing is a standard process in music production for correcting mistakes and enhancing quality, but this is tedious and time-consuming. The Intelligent Music Editor, or IMED, automates routine music editing tasks using advanced techniques for music transcription (especially score alignment), and signal processing. The IMED starts with multiple recorded tracks and a detailed score that specifies all of the notes to be played. A transcription algorithm locates notes in the recording and identifies their pitch. A scheduling model tracks instantaneous tempo of the recorded performance and determines adjusted timings for output tracks. A time-domain pitch modification/time stretching algorithm performs pitch correction and time adjustment. An empirical evaluation on a multi-track recording illustrates the proposed algorithms achieve an onset detection accuracy of 87% and a detailed subjective evaluation shows that the IMED improves pitch and timing accuracy while retaining the expressive nuance of the original recording.
Remixing Stereo Music with Score-Informed Source Separation, where alignment is used to help with source separation, with the goal of editing individual instruments within a stereo audio mix.
Bootstrap Learning for Accurate Onset Detection, which uses alignment to find note onsets, which are then used as training data for automatic onset detection.
Dannenberg and Mohan, “Characterizing Tempo Change in Musical Performances,” in Proceedings of the International Computer Music Conference, Huddersfield, UK, August 2011. San Francisco: The International Computer Music Association. pp. 650-656.
ABSTRACT: Tempo change is an essential feature of live music, yet it is difficult to measure or describe because tempo change can exist at many different scales, from inter-beat-time jitter to long-term drift over several minutes. We introduce a piece-wise linear tempo model as a representation for tempo analysis. We focus on music where tempo is nominally steady, e.g. jazz and rock Tapped beat data was collected for music recordings, and tempo was approximated as piece-wise linear functions. We compare the steadiness of tempo in recordings by accomplished, professional artists and in those by amateur artists, and show that professionals are steadier. This work offers new insights into the nature of tempo change based on actual measurements. In principle, improved models of tempo change can be used to improve beat tracking reliability and accuracy. In addition to technical applications, observations of music practice are interesting from a musicological perspective, and our techniques might be applied to a wide range of studies in performance practice. Finally, we present an optimal function approximation algorithm that that has broader applications to representation and analysis in many computer music applications.[Acrobat (PDF) Version]
Fu, Xia, Dannenberg, and Wasserman, “A Statistical View on the Expressive Timing of Piano Rolled Chords,” in Proceedings of the 16th International Society for Music Information Retrieval Conference, Malaga, Spain, October 2015. pp. 578-583.
ABSTRACT:Rolled or arpeggiated chords are notated chords performed by playing the notes sequentially, usually from lowest to highest in pitch. Arpeggiation is a characteristic of musical expression, or expressive timing, in piano performance. However, very few studies have investigated rolled chord performance. In this paper, we investigate two expressive timing properties of piano rolled chords: equivalent onset and onset span. Equivalent onset refers to the hidden onset that can functionally replace the onsets of the notes in a chord; onset span refers to the time interval from the first note onset to the last note onset. We ask two research questions. First, what is the equivalent onset of a rolled chord? Second, are the onset spans of different chords interpreted in the same way? The first question is answered by local tempo estimation while the second question is answered by Analysis of Variance. Also, we contribute a piano duet dataset for rolled chords analysis and other studies on expressive music performance. The dataset contains three pieces of music, each performed multiple times by different pairs of musicians.[Acrobat (PDF) Version]