Spring 2025 Projects | Roger B. Dannenberg

Summary

I wrote this to share with students and prospective students what I’m working on, and interested in working on, during the next year. This is an update and revision of Projects - Spring 2023.

About the Author

Roger B. Dannenberg is Emeritus Professor of Computer Science at Carnegie Mellon University. He is known for a broad range of research in Computer Music, including the creation of interactive computer accompaniment systems, languages for computer music, music understanding systems, and music composing software. He is a co-creator of Audacity, perhaps the most widely used music editing software.

Internships

If you are looking for an internship, I cannot offer a salary or cover travel expenses, but I have some funds for minor research expenses. Since Covid, I have been working remotely from home.

Ph.D.s

If you are looking for a Ph.D., CMU is a great place, but I'm no longer taking new students. Chris Donahue joined our faculty in fall 2023 and is active in music-related computer science research, teaching and advising.

I receive a lot of requests for internships and supervision. Prospective interns and Ph.D.s should read the sidebar at left. Here’s what I’m doing and thinking about these days.

O2

O2 is a network protocol especially for music control. It is intended to be an OSC “do over” given that even tiny low-cost controllers can communicate using IP, and also given what we’ve learned from experience with OSC (Open Sound Control).

Mainly, O2 introduces discovery so users do not have to type in IP addresses and port numbers. O2 also supports clock synchronization, timed message delivery, and some publish-subscribe capabilities. O2 is also global in the sense that discovery is not limited to the local area network. O2 works over Web Sockets to connect to in-browser applications, and through shared memory for low-latency applications such as interactive audio applications.

What comes next are some extensions and further work to make O2 even more complete and interesting. These are some potential projects:

Bindings and loadable O2 library (written in C++) for Python, Rust, Go, maybe others, and some examples.
Native implementations of O2lite, a minimal protocol that allows communication and discovery of a single O2 host process. We have implemented O2lite in C, Python and JavaScript, but not Rust or Go, and there might be others.
O2lite in Javascript could be used to create some inspection and debugging tools.
O2 could use a good tutorial, e.g. on github.
O2 should be supported in MaxMSP and Pd as externals. A Pd implementation exists but needs more testing and examples.

Symbolic Music Analysis

The AMADS project (Algorithms for Music Analysis and Data Science) is creating reference implementations of many symbolic music analysis algorithms from the music theory literature for tasks such as key finding, contour analysis and classification, pitch distributions, and chord labeling. There is always room for contibutors. The main task is: find an algorithm in the literature that has not been ported, re-implement the published algorithm as accurately as possible, write documentation, write some tests, write a example to show how to use it, submit for inclusion into our repo.

Arco

Arco is a framework for building interactive real-time music systems based on O2 for communication with an audio server or library. It is usable now, but could use further development along these lines:

I am using Serpent, a real-time scripting language for control, but it’s obvious that Python would be popular and create more application opportunities. We have a partly-implemented library to control Arco from Python, but more work, documentation, testing and examples are needed.
Arco can run DSP algorithms created in FAUST, and there is an opportunity to provide many interesting audio effects and synthesizers by drawing on the large FAUST library.
Arco has a new Instrument class, but only one interesting synthesizer. More instruments and examples could provide some “off-the-shelf” sounds for users.
It would be great to port Arco to a microcontroller, just to show off the compactness and efficiency of Arco. This should be straightforward for Raspberry Pi since Arco runs on Linux. Building an interesting demo could take some effort, but would be fun.
One possibility for the communications-oriented structure of Arco is to interoperate with a process doing inferencing with deep networks. We do not expect something like PyTorch to operate with the low-latency or hard deadlines expected for audio processing in general, but by operating asynchronously via messages, we can decouple the timing of inferencing from the timing within the Arco audio processing thread and implement “best-effort” computing from machine learning systems. This could be used to perform analysis, make predictions, improvise music (with some prediction involved, but that's required of human improvisers too), and perhaps other interesting real-time music tasks. It seems that real-time interaction is a new frontier for machine learning, and it is attractive for academic researchers. (For non-real-time tasks, we are at a big disadvantage compared to corporations with great computing budgets and resources. Supercomputers in the cloud have high latency due to networks and scheduling, and are probably not so useful for real-time interactive applications where datasets might be small and flexibility and control are more important than high classifier accuracy or ability to generalize over terabytes of training data.)

Accomplice and Computer Accompaniment

I have implemented Accomplice, a computer accompaniment system for keyboard performances (using MIDI). Accomplice matches a live keyboard performance to a score, estimates the score location and tempo, and plays a MIDI file (also other formats for O2 and OSC messages) in synchrony.

There is room for improvement and further research:

Accomplice does not handle trills, although we have some prior work that could probably solve the problem.
Computer accompaniment systems are very difficult to test. In my opinion, statistics on performance are not relevant to a real concert setting, where the requirement is to never fail, and to recover gracefully when there are real challenges. I think a good way forward is to collect and catalog failure cases rather than rely on datasets of competently played performances. To improve Accomplice, we mainly work on problems case-by-case, but currently we do not have a way to do regression testing on a collection of problem situations.
It would be great to compare Accomplice to other recent systems, but again we need some data and evaluation methodology. Even existing datasets would be interesting to try.
Accomplice has a user interface and a lot of flexibility, but we really need musicians and composers to use it and give critical feedback.
Accomplice is a specialization of a more general system that can also use foot-tapping input to synchronize to steady-beat music like jazz and rock. One path for further development would be to incorporate this into a real band, which would require arrangements, rehearsal time and patience, but could lead to some interesting performance possibilities.

Global Drum Circle

I'm working with a small group now to develop an online drum circle experience. Latency is a big issue, and our approach is to organize drumming into cycles of 4 or more measures. Locally, you hear your own drums immediately, but you hear everyone else's drums with a 1-cycle delay, e.g. 4 measures later than they were played. Similarly, everyone else hears your drums 4 measures later than when you actually played them. Experience has shown that there has to be a reference such as bass drum hits in order to allow the tempo to “lock in.”

I've mapped out a number of interactive scenarios such as call-and-response, follow-the-leader, alternating group and solo play, etc., and we've done some prototypes and testing, so I believe it is possible to make drumming online an enjoyable experience.

We have been iterating on interface ideas, then testing with a small group online, and could use more help from an interested developer that knows Javascript.

Ultimately, we hope to scale up to multiple drum circles around the world that run 24/7 and where people can join and leave, and depending on how many participants are online, drum circles can split and merge.

Web Audio Soundcool

Soundcool is a software-based modular synthesis system that is very easy to use. I’m working with an international team to port Soundcool to Web Audio, Javascript, and React so that users can play with Soundcool just by visiting a free website. Things are currently stalled due to the fact that versions of React and Node.js become obsolete faster than we can integrate the changes. A good web developer with some time could pull this together.

Computer Music Archeology

There are some early works in music composition that, rather than relying on sophisticated machine learning, simply implemented very insightful rules or algorithms of music theory and music composition. I've tried to understand what’s going on in these programs, because many of them outperform the so-called “state-of-the-art” methods that have become popular recently. Some efforts to recreate some of this early work could be very interesting and allow better understanding as well as additional experimentation. I am particularly interested in understanding the scope of output (does the software always write a variation of essentially the same song, or does the output have a range of ideas and forms?) as well as what is musically important (what’s more important: pitch, rhythm or form? And when is careful selection better than random choice?) This could lead to advances by revealing forgotten secrets of music.

Music Patterns and Music Models

Music structure is critical, but not well understood. I‘ve worked with students to implement music prediction models. Our claim is that while there are general tendencies in music (e.g., small pitch intervals are more common than large intervals in melody), there are also important local tendencies. For example, the first few bars of Beethoven's Fifth Symphony tells you much more about the next few bars than general knowledge of all classical music will tell you.

Our hypothesis is that by identifying patterns and repetition in music we can create better models for music generation and listening. Our approach is based on prediction: We rate models on their ability to predict the next element in a sequence (of pitches, durations, intervals, or whatever), and we measure this quantitatively in terms of entropy.

The actual work here consists of gathering and pre-processing music in machine-readable to form datasets, writing and debugging models, and running experiments to evaluate different models and parameters on different datasets. I think the next step will be coding and evaluating ad-hoc models that are based on common music ideas and structures.

Coda: Machine Learning and Music Generation

Many students write to say they know all about machine learning and would love to come to be interns. I can understand the excitement and enthusiasm. Unfortunately, my experience is that by the time students “tool up” and get enough experience to tackle some real problems, most of a summer or semester (or 2) has gone by, and there’s no time to make any advances. I would not say this is a bad area for research, but it seems that most of the obvious things are already done (and a great deal more). When the low-hanging fruit is gone, you really need a ladder or some secret advantage, whether it is a supercomputer, experience and insight, or just a good novel idea. I do not feel I can offer that now to undergrads in search of a quick but rewarding research experience. Many of the other topics listed above have some potential for completing something interesting and even publishable in a couple of months, but if you are only excited by machine learning applications, you should follow your heart and passion. That is where you will find the greatest happiness and accomplishments.