pvoc
This extension shows how to use phase vocoder functions.
load "pvoc/phasevocoder.sal" exec test-all()
This page describes how to use the phase vocoder in Nyquist. The
phasevocoder is used for time stretching a signal without changing
the pitch, and it can be combined with resampling to change both
time and pitch independently.
The phase vocoder function is phasevocoder, which is
called as follows:
phasevocoder(input, map, fftsize, hopsize, mode)where input is the input sound, map is a map from output time to input time, fftsize is the number of samples to use in the analysis, hopsize is the step size for use in synthesis, and mode controls the synthesis algorithm. Only the first two parameters are required. These parameters will be explained further below.
A phase vocoder works by analyzing the input signal using the
FFT, which decomposes the signal into frequency components. The
basic assumption is that the input signal consists of sinusoids
(no impulses) and that the FFT can resolve these sinusoids into
separate components.
The phase vocoder is similar to granular synthesis in that the
input is broken into short grains of sound which are reassembled
to create an output sound. In the phase vocoder, the output signal
is always constructed from equally spaced grains. To achieve time
stretching, the spacing of the grains extracted from the input is
variable. If grains are taken with higher amounts of overlap, the
resulting output is longer so the input is apparently stretched.
If grains have less overlap in the input sound, the input will
sound speeded up.
The distinctive aspect of the phase vocoder is that when grains
are assembled into the output, the phase of each frequency
component is adjusted to avoid phase cancellation. Normally, if
the spacing of grains is changed from input to output, different
frequencies will be shifted by different amounts of phase. Some
will add constructively and some will add destructively. Thus,
some frequencies will be emphasized over others, often creating a
buzz at grain rate. The phase vocoder attempts to eliminate this
artifact by adjusting all phases to be coherent.
Read the audio from demos/audio/happy.wav and stretch
the first 4 seconds by a factor of 4 so that the output duration
is 16 seconds:
function test9() begin with inp = s-read(audio-file("happy.wav") play phasevocoder(inp, pwlv(0, 16, 4)) end
The pv-time-pitch function can be used to
simultaneously stretch and pitch shift an input signal. It shifts
pitch by resampling after phase vocoding. E.g. by stretching by a
factor of two and then resampling by a factor of two, the input
can be raised one octave in pitch without changing the duration.
The parameters are:
pv-time-pitch(input, stretchfn, pitchfn, dur,
fftsize, hopsize, mode)
where input is the input sound to be altered, stretchfn
controls stretching, pitchfn controls pitch shifting, dur
is the approximate duration, and the remaining parameters are
optional parameters passed to the phasevocoder function.
stretchfn and pitchfn work differently from
the map parameter passed to phasevocoder. stretchfn
gives the amount by which the input should be stretched at each
point in time. Thus, if you want the input at time 3.1 to be
stretched by a factor of 1.5, the value of stretchfn at
time 3.1 should be 1.5. Similarly, pitchfn is the pitch
shift amount (a frequency multiplier) to use at each point of time
in the input. If the pitch should be unaltered, the function value
should be 1. To raise the pitch, use values greater than 1 (e.g. 2
means go up by one octave).
In this example, we will alter a familiar tune, Happy Birthday, to another familiar tune, Twinkle Twinkle, using pv-time-pitch. To begin, I downloaded a rendition of Happy Birthday from YouTube. (Credit: Artist is unknown, but the URL is https://www.youtube.com/watch?v=vRJQ3QxLjG0). Next, I used Audacity to extract a segment of the music and save it to demos/audio/happy.wav. I also labeled each syllable with a label track in Audacity and exported the labels to demos/audio/labels.txt.
Let's work on stretchfn first. We want to make the
rhythm mostly quarter notes. For simplicity, let's make the tempo
120, so each quarter note is 0.5s. The stretch factor for a note
of length x to become length 0.5 will be 0.5/x. Thus, we need to
construct stretchfn to have a constant stretch factor
for each note, where the stretch factor is 0.5/(t2 - t1) where t2
is the time of the next note and t1 is the time of the note to be
stretched, i.e. (t2 - t1) is the duration.
We'll get the note times from the labels file:
0.000000 0.000000 Hap 0.166198 0.166198 py 0.369078 0.369078 birth 1.085618 1.085618 day 1.733822 1.733822 to 2.209605 2.209605 you 3.578676 3.578676 hap 3.857731 3.857731 py 4.055933 4.055933 birth 4.776379 4.776379 day 5.425378 5.425378 to 5.905255 5.905255 you 7.271426 7.271426 hap 7.550667 7.550667 py 7.746811 7.746811 birth 8.494934 8.494934 dayI assigned each time to a short local variable to make the calculations simpler (although I probably could have created an array and automated things a bit more):
with h1 = 0.000000, p1 = 0.166198, b1 = 0.369078, d1 = 1.085618, t1 = 1.733822, y1 = 2.209605, h2 = 3.578676, p2 = 3.857731, b2 = 4.055933, d2 = 4.776379, t2 = 5.425378, y2 = 5.905255, h3 = 7.271426, p3 = 7.550667, b3 = 7.746811, d3 = 8.494934Next, I calculate stretch factors for each note, being careful to make notes 7 and 14 half notes (using 1.0 instead of 0.5 as the target duration):
u1 = 0.5 / (p1 - h1), u2 = 0.5 / (b1 - p1), u3 = 0.5 / (d1 - b1), u4 = 0.5 / (t1 - d1), u5 = 0.5 / (y1 - t1), u6 = 0.5 / (h2 - y1), u7 = 1.0 / (p2 - h2), u8 = 0.5 / (b2 - p2), u9 = 0.5 / (d2 - b2), u10 = 0.5 / (t2 - d2), u11 = 0.5 / (y2 - t2), u12 = 0.5 / (h3 - y2), u13 = 0.5 / (p3 - h3), u14 = 1.0 / (b3 - p3)Next, I create the stretchfn using pwlv. This is a little awkward because to obtain steps in the function, each time point must have a time and value for the previous note, then the same time but a new value for the next note, creating a step in the function:
set *stretch* = pwlv( u1, p1, u1, p1, u2, b1, u2, b1, u3, d1, u3, d1, u4, t1, u4, t1, u5, y1, u5, y1, u6, h2, u6, h2, u7, p2, u7, p2, u8, b2, u8, b2, u9, d2, u9, d2, u10, t2, u10, t2, u11, y2, u11, y2, u12, h3, u12, h3, u13, p3, u13, p3, u14, b3, u14)The function is assigned to a global variable. This is not generally recommended, but it will make it easy for us to plot and inspect the function when debugging.
67 Hap 60 Twin 67 py 60 kle 69 Birth 67 Twin 67 Day 67 kle 72 To 69 Lit 71 You 69 tle 67 Hap 67 Star 67 py 65 How 69 Birth 65 I 67 day 64 Won 74 To 64 der 72 You 62 Who 67 Hap 62 You 67 py 60 Are
We'll need another function similar to *stretch* only using the pitch differences to create a transposition amount. Let's make v1 through v14 and substitute for u1 through u14 to make another pwl function.
Here are the vn variables, based on the numbers above (pairs are reversed, e.g. to go from the 67 “Hap” to 60 “Twin,” we call to-ratio(67, 60)), the function to-ratio, and finally, the pitch control function:
v1 = to-ratio(60, 67), v2 = to-ratio(60, 67), v3 = to-ratio(67, 69), v4 = to-ratio(67, 67), v5 = to-ratio(69, 72), v6 = to-ratio(69, 71), v7 = to-ratio(67, 67), v8 = to-ratio(65, 67), v9 = to-ratio(65, 69), v10 = to-ratio(64, 67), v11 = to-ratio(64, 74), v12 = to-ratio(62, 72), v13 = to-ratio(62, 67), v14 = to-ratio(60, 67) function to-ratio(a, b) return step-to-hz(a) / step-to-hz(b) set *pitch* = pwlv( v1, p1, v1, p1, v2, b1, v2, b1, v3, d1, v3, d1, v4, t1, v4, t1, v5, y1, v5, y1, v6, h2, v6, h2, v7, p2, v7, p2, v8, b2, v8, b2, v9, d2, v9, d2, v10, t2, v10, t2, v11, y2, v11, y2, v12, h3, v12, h3, v13, p3, v13, p3, v14, b3, v14)
Here is the entire program. (You might need to change the path to
happy.wav depending on where you run the program. The
program is in demos/src/phasevocoder.sal and it should
run from there.)
;; test 10 -- happy birthday to twinkle twinkle function test10() begin with h1 = 0.000000, p1 = 0.166198, b1 = 0.369078, d1 = 1.085618, t1 = 1.733822, y1 = 2.209605, h2 = 3.578676, p2 = 3.857731, b2 = 4.055933, d2 = 4.776379, t2 = 5.425378, y2 = 5.905255, h3 = 7.271426, p3 = 7.550667, b3 = 7.746811, d3 = 8.494934, u1 = 0.5 / (p1 - h1), u2 = 0.5 / (b1 - p1), u3 = 0.5 / (d1 - b1), u4 = 0.5 / (t1 - d1), u5 = 0.5 / (y1 - t1), u6 = 0.5 / (h2 - y1), u7 = 1.0 / (p2 - h2), u8 = 0.5 / (b2 - p2), u9 = 0.5 / (d2 - b2), u10 = 0.5 / (t2 - d2), u11 = 0.5 / (y2 - t2), u12 = 0.5 / (h3 - y2), u13 = 0.5 / (p3 - h3), u14 = 1.0 / (b3 - p3), v1 = to-ratio(60, 67), v2 = to-ratio(60, 67), v3 = to-ratio(67, 69), v4 = to-ratio(67, 67), v5 = to-ratio(69, 72), v6 = to-ratio(69, 71), v7 = to-ratio(67, 67), v8 = to-ratio(65, 67), v9 = to-ratio(65, 69), v10 = to-ratio(64, 67), v11 = to-ratio(64, 74), v12 = to-ratio(62, 72), v13 = to-ratio(62, 67), v14 = to-ratio(60, 67) set *in* = s-read("/Users/rbd/nyquist/demos/audio/happy.wav") set *stretch* = pwlv( u1, p1, u1, p1, u2, b1, u2, b1, u3, d1, u3, d1, u4, t1, u4, t1, u5, y1, u5, y1, u6, h2, u6, h2, u7, p2, u7, p2, u8, b2, u8, b2, u9, d2, u9, d2, u10, t2, u10, t2, u11, y2, u11, y2, u12, h3, u12, h3, u13, p3, u13, p3, u14, b3, u14) set *pitch* = pwlv( v1, p1, v1, p1, v2, b1, v2, b1, v3, d1, v3, d1, v4, t1, v4, t1, v5, y1, v5, y1, v6, h2, v6, h2, v7, p2, v7, p2, v8, b2, v8, b2, v9, d2, v9, d2, v10, t2, v10, t2, v11, y2, v11, y2, v12, h3, v12, h3, v13, p3, v13, p3, v14, b3, v14) play pv-time-pitch(*in*, *stretch*, *pitch*, 10) end function to-ratio(a, b) return step-to-hz(a) / step-to-hz(b)