Summary

I was unable to find a single working open source code example that compiles and runs C++ in Web Audio. Maybe that will change by the time you read this. The main problems I set out to solve are: (1) Installing Emscripten (Emscripten's instructions are incomplete at best), (2) Compiling C++ code (Emscripten has changed, so most examples are broken), (3) Writing code to load and communicate with a Web Audio Worklet (this has always been confusing and poorly documented). This is all conducted on a MacBook Air M2 running Ventura (macOS) 13.5.2.

Example Code

All code and corresponding generated web pages are here.

About the Author

Roger B. Dannenberg is Emeritus Professor of Computer Science at Carnegie Mellon University. He is known for a broad range of research in Computer Music, including the creation of interactive computer accompaniment systems, languages for computer music, music understanding systems, and music composing software. He is a co-creator of Audacity, perhaps the most widely used music editing software.

This post has several sections that progressively build up web audio capabilities using C++ and WASM. If you want to skip directly to some usable results, go to Example 3 or even Example 5. If you have problems, you might want to at least read about steps leading from the most simple, reproducible, and reliable starting point (Example 1), through to a real example with readable code (Example 5). The full text includes a number of pitfalls and solutions I had to work through to get to the final result, so if you also run into difficulties, maybe you will find some answers in the intermediate steps.

Ultimately, we want the following capabilities:

Install and run Emscripten.
Write C++ code for audio signal processing.
Compile the code with Emscripten, requiring particular options.
Create code to load and use the Audio Worklet.
Communicate with the Audio Worklet.
Send synchronous commands and updates at clean points in the audio computation.
Instantiate multiple Audio Worklets of different types.

Example 1

Creating Web Audio Worklets with WASM in C++ will take a number of steps: In this example, we start with a basic test showing that C++ can be compiled and run as an Audio Worklet.

Install and Run Emscripten

Emscripten installation is fragile. What I got to work was the following. Note that instead of the git clone line, you can just download and unzip emsdk-main.zip from the Code button on the emsdk GitHub page.

# run this in a NEW TERMINAL!
/Users/rbd % git clone https://github.com/emscripten-core/emsdk.git
/Users/rbd % cd emsdk
/Users/rbd/emsdk % ./emsdk install latest
/Users/rbd/emsdk % ./emsdk activate latest
/Users/rbd/emsdk % source ./emsdk_env.sh

Why do I say fragile? A few gotcha's are

The so-called "download" is really an installer, so the installation is really a multi-step process requiring ./emsdk install latest and ./emsdk activate latest.
The configuration step, source ./emsdk_env.sh, sets some environment variables that will influence future behavior. E.g. if you delete ~/emsdk, start over with the zip file and create ~/emsdk-main, then cd ~/emsdk-main; source ./emsdk_env.sh will fail! (Starting a new shell or carefully removing environment variables EMSDK, EMSDK-PYTHON and EMSDK-NODE will probably fix this.

Now you should be able to run emcc, only in the current shell!, for example cd; emcc -v.

Write C++ Code

I’m going to start with an existing example to avoid introducing new bugs and problems. I'll use emsdk/upstream/emscripten/test/webaudio. I'm not sure yet how that works, so I'll copy the whole webaudio directory, renaming to my ~/soundcool/online/example.

Compiling the Code

The original code is part of a test suite for emscripted. There are no build instructions I could find, but I found this issue on Github complaining about how -sMINIMAL_RUNTIME=1 does not work. That’s actually really promising because removing the switch apparently does work.

I copied the compile recipe from this Issue to a Makefile I'll use to record how to build the full example. (See the sidebar for a link to source code.)

Running the Code

One caution about testing: Browsers try to cache files, so if you change something and reload a web page, you might not actually load the changes.

Even if you do a “hard” reload, browsers do not seem to reload modules. To eliminate this potential problem, I always restart the server with a new port for every test, e.g. the example below shows port 8138.

The simplest thing you can try is:

 % # start in the example1 directory
 % make
 % cd web
 % python3 -m http.server 8138

Run Chrome and visit UTL localhost:8138/test.html. Unfortunately, the code will not work, and if you open the browser console (e.g. CMD-OPT-I), you'll see:

uncaught ReferenceError: SharedArrayBuffer is not defined

This is because of a recent security ”feature” that requires a secure context and cross-origin isolation. Yikes! Easily stated, but what can you do about it?

To begin with, you can turn of the security by quiting Chrome and running it from the command line in another terminal using:

 % /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --enable-features=SharedArrayBuffer

Note that the particular problem is that the Web Audio code uses SharedArrayBuffer, but this symbol is unbound if security requirements are not met. Enabling SharedArrayBuffer apparently leaves the security in place but makes an exception for SharedArrayBuffer.

Now, you can again visit localhost:8138/test.html (assuming your local server is still running) and you should hear noise when you click on the “Toggle playback” button.

Note: When I run this, the web page has two big black boxes that look like something failed. I’m not sure of the intention of this emscripted-generated page but at least for this simple test, you can ignore this problem.

Dealing With Cross-Origin Isolation

The next thing we should fix is the security measures so we can run this example in a normally configured browser.

In example1, I have provided a small Python programm that adds some headers that will enable Cross-Origin Isolation. The headers are:

Cross-Origin-Embedder-Policy: require-corp
Cross-Origin-Opener-Policy: same-origin

You can run the server from the example1/web directory using:

 % ../webserver 8158

Quit the Chrome you started from the command line (to make sure you quit that Chrome, you can check for a new prompt in the Terminal window where you started it). Then start Chrome in the normal manner and visit localhost:8158/test.html. The “Toggle playback” button should work.

Example 2: Cleaning Up

Now we have an ugly page that makes noise, and we need to isolate the means to write and execute C++ code in an audio worklet. The purpose for starting in this manner was to start with something that is simple to reproduce and shows some signs of working, even if it is not directly useful. Let’s move on from there.

The first cleanup is to remove most of the HTML from test.html, replacing it with a simple title. It would also be very helpful to test for Crosss-Origin Isolation, so I have added the following to test.html:

    <script>
console.log("Executing script in test.html");
if (!crossOriginIsolated) {
console.error("crossOriginIsolated is " + crossOriginIsolated,
              "which will result in no definition for SharedArrayBuffer.\n",
              "Try serving pages with the following headers:\n", 
              "    Cross-Origin-Embedder-Policy: require-corp\n",
              "    Cross-Origin-Opener-Policy: same-origin"); 
}
    </script>

You can test this by running python3 -m http.server 8200 and visiting localhost:8200/test.html. You will see on the browser console the crossOriginIsolated error followed by the Uncaught ReferenceError: .... At least now there is a clear explanation when you fail to run this with the necessary headers.

Since the original web/test.html is created by running emscripten, we rename the modified test.html to test2.html in the same directory as the make file. We modify Makefile to copy our test2.html to web/test2.html.

We also rename audioworklet.c to audioworklet.cpp, and change the name in Makefile. This eliminates an error message since, previously, we were invoking the em++ C++ compiler on a C file.

The final form of Example 2 is in the example2 directory (see the sidebar for example source code). You can run it from the directory example2/web using ../webserver 8159 and visiting localhost:8159/test2.html in the browser. Be sure to use test2.html and not test.html.

Example 3

Next, we need to be able to communicate with the audio worklet.

Calling C++ Functions from JavaScript

The main change is to add the flag -lembind to em++ and create some declarations in the source code to create and export functions to JavaScript. Here is what I added to audioworklet.cpp:

float example3(int a, float b) {
    return a + b;
};

void set_gain(float g) {
    gain = g;
}

EMSCRIPTEN_BINDINGS(my_module) {
    emscripten::function("example3", &example3);
    emscripten::function("set_gain", &set_gain);
}

The bindings code requires a header, the set_gain function sets a global variable gain that must be declared, and I need to apply gain in the audio generation, so the code also has these additions and changes:

// near the top ...
#include <emscripten/bind.h>

// after the includes ...
float gain = 1.0;

// in ProcessAudio() function
      ...
      outputs[i].data[j] = (rand() / (float)RAND_MAX * 2.0f - 1.0f) * 0.3f * gain;
      ...

Makefile Changes

In this example, I renamed things from test and test2 to example3. Since I do not want the default example3.html generated by emscripten, I changed the compiler output flag to -o web/example3.js, which generates only example3.js and example3.ww.js.

Test Function Calls

I modified the example3.html (former test2.html) with buttons to call the new functions as follows:

...
    <script>
function call_example3() {
    var sum = document.getElementById("sum");
    sum.innerHTML = "" + Module.example3(5, 10.3);
}
function call_soft() {
    Module.set_gain(0.1); 
}
function call_loud() {
    Module.set_gain(1.0); 
}
    </script>
  </head>
  <body>
    <h1>Example 2</h2>
    <button type="button" onclick="call_example3();">Add</button>
    <span id="sum"></span><br><br>
    <button type="button" onclick="call_soft();">Soft</button>
    &nbsp;&nbsp;
    <button type="button" onclick="call_loud();">Loud</button><br><br>
  </body>

This test is not very pretty, but it is simple and has the essentials to create an audio module and to get data in and out (note that the example() function returns a float to JavaScript.) You can run the server from the example3/web directory using:

 % ../webserver 8160

Then visit localhost:8160/example3.html

. The “Toggle playback” button will play noise. Then, “Soft” and “Loud” buttons will call into the module to change the gain variable. The “Add” button just returns and displays a sum to prove that can compute and return values from C++.

A couple of remaining problems are (1) this code creates the Audio Context automatically, but in general we want to create the Audio Context in JavaScript and then create one or more instances of audio nodes based on WASM; (2) when we call WASM module functions, they run asynchronously with respect to audio processing. That’s fine (usually) for gain control, but what if we need to run some code at a clean point between two audio block computations? We do not want to use locks in a real-time audio thread, so we need to use message passing. Let’s implement all of that in Example 4.

Example 4

Our goal in this example is to make 2 instances of a tone generator, control them synchronously through messages, receive messages from the tone generators, and run everything in an audio context that is specified by the “application” running as the main JavaScript thread.

Using a Specified Audio Context

At this time, I do not know how to transfer an audio context reference from JavaScript to C++, so we are going to jump through hoops to make an audio context in C++ and transfer it out where JavaScript code can access it. This is done through two pretty awful hacks: one transfers from C++ to JavaScript embedded in the C++ code. The second uses the embedded JavaScript to write to the global variable window.emgl_audio_context. The code fragments are shown below:

// make the audio context in C++ function worklet_create:
    EMSCRIPTEN_WEBAUDIO_T context = emscripten_create_audio_context(
                                     0 /* use default constructor options */);

// when the Audio Worklet Processor is ready, the C++
// AudioWorkletProcessorCreated callback runs and performs:
    call_worklet_callback(audioContext, success);

// call_worklet_callback is embedded JavaScript as follows:
EM_JS(void, call_worklet_callback, (EMSCRIPTEN_WEBAUDIO_T audioContext,
                                    EM_BOOL success), {
    audioContext = emscriptenGetAudioObject(audioContext);
    window.emgl_audio_context = audioContext;
    window.emgl_audio_worklet_callback(success ? 0 : 1);
});

Making Audio Node Instances

In example 3, the C++ code creates one audio node and connects it to the audio output. We want the ability to make multiple audio node instances. In addition, we need to be able to control each instance. We might get away with AudioParams, but I want to be able to send and receive messages. Audio nodes have ports for this purpose, but I do not see any documentation on accessing and reading ports, so I will use a different convention:

Node state will exist in an array of structs, one for each node instance.
The index of the state will serve as a node identifier.
The node identifier will be returned when the node is created.
To send messages to a node, we will invoke a function where one parameter is the node identifier. This will be used either to write into the proper node state, or for synchronous updates, the message will be placed into a per-node-instance fifo that is checked in the audio process function of the node.
Since every node uses the same audio process function, we can use the UserData parameter to pass in a pointer to the node instance's state structure. Each node instance will run the same audio process function but each with a different state structure.

This is definitely a hack, but it works without access to ports.

Creating a Framework

To get this far, I have started abstracting some of the common code needed to communicate with WASM, putting it in audioutil.js. Some details for using the code appear at the top of that file.

Running Example 4

To run example 4, click on “Play a Tone” and you should hear a sine tone. The “Soft” and “Loud” buttons will change the amplitude. Clicking “Play a Tone” again and you should hear a second sine tone. Each click adds a new tone to the mix by creating a new audio node. Note that “Soft” and “Loud” affect only the first tone.

There's a new checkbox to suspend or resume the audio context using an added checkbox and toggle_audio function in example4.html.

The next example will try to clean up and generalize this example.

Example 5: The Real Deal (Almost)

In this example, I present a more complete example and flesh out the growing “framework&rdqou;. I used quotes here because the goal is only to provide some useful and reusable functions supporting this particular style of creating web audio nodes with WASM. This example is even longer than Example 4, so if you want to learn the basics of creating and running audio nodes in C++ and WASM, you might want to study more basic and more concrete code in earlier examples.

What Example 5 Demonstrates

There are some new capabilities in Example 5:

Multiple types of audio nodes.
Synchronous update messages from JavaScript to audio nodes.

Multiple Types of Audio Nodes

The first problem is creating more than one node type. One way might be to use a separately compiled WASM module for each type. But the WASM module is accessed through Module, so we would need a way to give each separate compilation a different global name. Even if that were solved, we have no way to hand an audio context to the WASM module, so each module would create a separate audio context. As far as I can tell, interconnected nodes have to be created in the same context.

The alternative is to pretend like there is just one type of node, which we will call an awnode, but when we create the audio node, we get to provide a void *UserData, so we'll pass in a pointer to an instance of class Awnode. Then, we can subclass Awnode to make as many different “node types” as we want. It’s a hack, but works within the limitations we are facing in our understanding of emscripten.

Instantiating Audio Nodes

To instantiate an audio node, we call audio_node_create(name) from JavaScript, which invokes audio_node_create(name) in C++. This function looks up name in a table of constructor functions and runs the constructor to create an instance of the named subclass of Awnode. Then, emscripten_create_wasm_audio_worklet_node is called to make an actual audio node with the signal processing callback awprocess and UserData set to the instance. When the callback is called, it invokes the process method on the instance.

To access the instance for parameter updates or other operations, the instance address is stored in the table audio_nodes and the table index is returned to the JavaScript caller as a node identifier or “node_id”. A JavaScript global variable, window.emgl_audio_node, is used to return the audio node as a second return value. The caller can use the audio node directly to make audio connections. The node_id is used in calls to update the audio node state, described in the following paragraphs.

Synchronous Update

Since audio processing is on its own thread (associated with the audio context), we must be careful updating parameters of unit generators. The scheme I adopted is best illustrated by describing a particular update: Setting the frequency of a sine tone generator. Recall that a node_id was returned to the caller when the sine tone generator was created. To update the frequency to 440 Hz, the main thread in JavaScript calls sinetone_set_freq(node_id, 440.0). This function is implemented in C++ in the WASM module.

The sinetone_set_freq function finds an instance of Awnode at audio_nodes[node_id]. Every Awnode has a message queue implemented in C++. Messages are simply structs where the first field is typically an “opcode” (enum value) that identifies the message type. The rest of the message consists of parameters. For the Sinetone class the message struct is simply the following:

/enum Opcode : int32_t {SET_GAIN = 0, SET_FREQ = 1};

struct Sinetone_msg {
    Opcode opcode;
    float val;
};

and sine_set_freq looks like this:

void sinetone_set_freq(int id, float hz)
{
    Sinetone_msg msg;
    msg.opcode = SET_FREQ;
    msg.val = hz;
    
    send_message_to_node(id, 'sntn', (char *) &msg);
}

send_message_to_node enqueues the message. Leaving out some consistency checks, it looks like:

void send_message_to_node(int id, int class_id, char *msg)
{
    Awnode *awn = audio_nodes[id];
    assert(awn->class_id == class_id);
    Pm_Enqueue(awn->queue, msg);
}

To make sure that sinetone messages only go to instances of class Sinetone, we store a unique class identifier in every instance and pass the expected class_id (a four-letter multi-character constant, which packs the ascii letters 'sntn' into a 32-bit integer in this case). The assert checks for an instance of the expected type.

In the audio processing callback awprocess, the first thing we do is deliver any pending messages from the instance’s queue. The code looks like:

EM_BOOL awprocess(int numInputs, const AudioSampleFrame *inputs,
                  int numOutputs, AudioSampleFrame *outputs, int numParams,
                  const AudioParamFrame *params, void *userData)
{
    char msg_buffer[64];
    Awnode *awn = (Awnode *) userData;
    // process all incoming messages:
    while (Pm_Dequeue(awn->queue, msg_buffer)) {
        awn->message_handler(msg_buffer);
    }

    // run the sample processing method:
    if (!awn->process(numInputs, inputs, numOutputs, outputs,
                      numParams, params)) {
        ... clean up when node is finished ...
    }
    return EM_TRUE;
}

This gets the data into Sinetone’s message handler, which looks like:

void Sinetone::message_handler(char *msg)
{
    Sinetone_msg *stm = (Sinetone_msg *) msg;

    if (stm->opcode == SET_GAIN) {
        gain = stm->val;
    } else if (stm->opcode == SET_FREQ) {
        phase_inc = stm->val * M_PI * 2 / 48000;
    }
}

So finally, the gain member variable of our Sinetone is set. Note that message handling always takes place synchronously on the audio thread in between calls to the (audio) process method.

Lock-Free Queues

Be careful if you make your own FIFO for messages because the sender and receiver are on different threads, and you cannot use locks for a critical section where you add data and update pointers.

The implementation here is almost a direct copy from PortMidi which has been using the same code for more than a decade. The algorithm cleverly relies only on atomic reads/writes to 32-bit words with no special instructions like compare-and-swap (CAS).

One caveat with PortMidi's fifos is that there are special checks to make sure overflow do not go unnoticed, but my example code assumes this never happens.

Emscripten Bindings

As mentioned earlier, I used Embind to create function call interfaces to access C++ functions from JavaScript.

In this example, I moved all the EMSCRIPTEN_BINDINGS into relevant source files, e.g. the binding for sinetone_set_freq is now created in sinetone.cpp, and there is no EMSCRIPTEN_BINDINGS in the main example5.cpp file. In retrospect, this is obvious, but it is not documented that each use of EMSCRIPTEN_BINDINGS(name) requires a unique symbol for name. Otherwise, one or the other set of bindings will be silently dropped.

Main Program

Putting it together, our main progam, exampl5.cpp, is now quite simple. We just have to register the classes we want to use:

int main()
{
    srand(time(NULL));

    assert(!emscripten_current_thread_is_audio_worklet());

    register_audio_worklet_class("sinetone", &sinetone_constructor);
    register_audio_worklet_class("noise", &noise_constructor);

    main_has_run = true;
}

The assignment to main_has_run is used for consistency checking when worklet_create is called. The constructor functions are very simple and included in sinetone.cpp and noise.cppsinetone.cpp:

Awnode *sinetone_constructor()
{
    return new Sinetone();
}

Running Example 5

Run the example just like the previous ones. In the example5/web directory, run

../webserver
8052

(I continue to restart the server on a new port each time I recompile the WASM module to make sure we do not get a cached version.) Then, visit localhost:8052/example5.html.

The screen looks like this:

The three “Play” buttons create audio nodes and connect them to the audio output. The two synthesis types, sine tone and white noise, are demonstrated. The “Soft” and “Loud” buttons change the gain, and the “Low” and “High” buttons set different pitches. All of the updates are synchronous, although in these simple cases, it would be fine to just write new parameters directly into the object member variables without introducing messages or message queues.

Conclusions

I wrote this because I want to use WASM and Web Audio, but I could find no working source code. The methods here are largely a set of work-arounds for problems that should not exist in the first place, and I hope better solutions will be available or revealed in the near future. For now, here are some problems for the Emscripten team to address. These are either real problems in software and design or just cases where documentation is incomplete or too hard to find. I would be happy to update these examples with better solutions:

emsdk_env.sh has a chicken-and-egg problem: it uses shell variables that it also sets, so if you want to switch to another emsdk directory or simply rename one that you are using, emsdk_env.sh will not run in the new location.
Nomenclature is confusing: the emscripten “Download” actually “downloads” an installer that you have to use to actually download emscripten. But emsdk_env.sh is part of the installer, so it could easily check to see if you have installed emscripten before trying to configure it.
The -sAUDIO_WORKLET=1 switch in emscripten creates code that loads a module accessed as Module. It is not clear (is it even possible?) how to use a different name to allow multiple modules, but it is very limiting to say you can only have one audio node implemented with WASM.
Even if you solve the multiple-modules problem, the generated code also creates an audio context. It is not clear (is it even possible?) how to use an existing audio context rather than create a new one. This is a very emscripten-centric approach, and it would be better to consider how one would introduce a WASM audio node into an existing audio context.
It is not clear (is it even possible?) how to access audio node ports from emscripten. It should be possible to send something from JavaScript to an audio node and receive the message in the C++ function that implements signal processing for the node in order to implement synchronous updates and commands. Messages should at least allow some structure: A minimal mechanism would allow receiving an array of ints and floats, but it is not clear how arrays are represented or converted from the JavaScript side to the C++ side. Ports should also work both ways, allowing structures such as arrays to be sent from C++ to JavaScript through the audio node port.
Emscripten creates an AudioWorklet that runs audio nodes. Emscripten seems to associate one audio worklet with one audio node type. If you try to make a second AudioWorklet to support a second audio node type, this will be blocked by an assertion that says you can create just one worklet. The Web Audio API is very vague about how all this works -- an AudioWorklet is said “to supply custom audio processing scripts that execute in a separate thread,” but the thread belongs to the audio context, so how does an AudioWorklet fit into the architecture? And what does “supply” mean? Is AudioWorklet actually just a factory for AudioNodes? Why can’t emscripten make more than one?
When using EMSCRIPTEN_BINDINGS(id), each instance across all compilation units must use a unique symbol for id. The function or requirements for id are not mentioned in the documentation, and failing to use different id's (e.g. just using my_module as suggested by the documentation example, but in multiple places), will silently fail to create (some) function interfaces.

Remaining Tasks

The examples here are limited to simple synthesis nodes with no AudioParams, no audio input, and one channel output. This configuration is wired into audio_node_create() in awutil.cpp. It should be simple to pass in parameters to allow for other configurations. That might allow a caller to instantiate a "sinetone" with 4 audio inputs (!), but it seems that a good design would somehow allow additional parameters to audio_node_create(), pass these into the constructor function to configure the particular instance of Awnode, and finally query the instance to get the number of inputs, number of outputs, channel counts, etc., to pass on to emscripten_create_wasm_audio_worklet_node().

There is also no message delivery from audio nodes back to JavaScript. I think this can be done relatively simply by creating a single message queue using Pm_QueueCreate. Any node can enqueue a message including the node’s id (which will have to stored in a new member variable and initialized when the instance is created). The main JavaScript thread can poll the queue through a simple procedural interface, or maybe there is a way to activate a JavaScript callback function when the queue is non-empty.

I need to think about how to handle FIFO overflow should be handled. For testing, it seems like overflow should result in a hard crash that cannot go unnoticed, but in production, maybe just dropping a message is better.

Acknowlegements

I would still be stuck without helpful hints from Hongchan Choi at Google, sbc100 via github (Sam Clegg) and ad8e via github (Kevin Yin).