Below are links to pieces of music and recordings of several notes. You are required to transcribe the music.
For transcription, you will have to determine the note or set of notes being played at each time
[This file] contains a recording of a harmonica piece rendering the song "Blowin' in the wind". Also given are a collection of notes, and an example of musical scale. Transcribe both the musical scale and the main song in terms of notes
[wav] version of music_extract.mp3
[This] is a recording of "Polyushka Polye", played on the harmonica. It has been downloaded with permission from the artist
Below are a set of notes from a harmonica
Note | Wav file |
---|---|
E | e.wav |
F | f.wav |
G | g.wav |
A | a.wav |
B | b.wav |
C | c.wav |
D | d.wav |
E2 | e2.wav |
F2 | f2.wav |
G2 | g2.wav |
A2 | a2.wav |
Download the following matlab files: [stft.m]
You can read a wav file into matlab as follows:
[s,fs] = wavread('filename');
s = resample(s,16000,fs);
The recordings of the notes can be computed to a spectrum as follows:
spectrum = mean(abs(stft(s',2048,256,0,hann(2048))),2);
'spectrum' will be a 1025 x 1 vector
The recordings of the complete music can be read just as you read the notes. To convert it to a spectrogram, do the following:
sft = stft(s',2048,256,0,hann(2048));
sphase = sft./abs(sft);
smag = abs(sft);
'smag' will be a 1025 x K matrix where K is the number of spectral vectors in the matrix. We will also need 'sphase' to reconstruct the signal later
Compute the spectrum for each of the notes. Compute the spectrogram matrix 'smag' for the music signal. This matrix is composed of K spectral vectors. Each vector represents 16 milliseconds of the signal.
You may find, projections, pseudo inverses, and dot products useful. If you know of any other techniques, you can use those too. Tricks like thresholding (setting all values of some variable that fall below a threshold to 0) might also help.
The output should be in the form of a matrix:
1 1 0 0 0 0 0 1 ...
0 0 0 1 1 0 1 1 ...
0 1 1 1 0 1 1 1 ...
...........................
Each row of the matrix represents one note. Hence there will be as many rows as you have notes in table 1.
Each column represents one of the columns in the spectrogram for the music. So if there are K vectors in the spectrogram, there will be K vectors in your output.
Each entry will denote if a note was found in that vector or not. For instance, if matrix entry (4,25) = 0, then the fourth note (d) was not found in the 25th spectral vector of the signal.
You can use the notes and the transcription matrix thus obtained to synthesize audio. Note that matrix multiplying the notes and the transcription will simply give you the magnitude spectrum. In order to create meaningful audio, you will need to use the phases as well. Once you have the phases included, you can use the stft to synthesize a signal from the matrix. Submit the synthesized audio along with the matrix.
Let's warm up with a simple problem.
A rotation in 3-D space is characterized by two angles. We will characterize them as a rotation along the X−Y plane, and a rotation along the Y−Z plane. Derive the equations that transform a vector [x,y,z]⊤ to a new vector [x′,y′,z′]⊤ by rotating it counterclockwize by angle θ along the X−Y plane and by an angle δ along the Y−Z plane. Represent this as a matrix transformation of the column vector [x,y,z]⊤ to the column vector [x′,y′,z′]⊤. The matrix that transforms the former into the latter is a rotation matrix.
For this problem you will transform the harmonica notes of problem 1 to piano notes, by a matrix transform. The piano notes can be downloaded from [here]. Note that, in this case, you don't know which piano notes correspond to which notes from the harmonica. There are 3 parts to this problem:
The following matrix transforms 4-dimensional vectors into 3-dimensional ones:
A=[1234345757911]A 4x1 vector v of length 4 is transformed by A as u=Av. What is the longest that u can be? What is the shortest length of u?
The “Restricted Isometry Property” (RIP) constant of a matrix characterizes the change in length of vectors transformed by sub-matrices of the matrix. For our matrix A, let As be a matrix formed of any s columns of A. If A is M×N, As will be M×s. We can form As in NCs ways from the N columns of A (we assume that the order of vectors in As is immaterial). Let w be an s×1 vector of length 1. Let lmax be the longest vector that one can obtain by transforming w by any As. Let lmin be the shortest vector obtained by transforming w by any As. The RIP-s constant δs of the matrix A is defined as:
δs=lmax−lminlmax+lmin
What is δ2 (i.e. δs for s=2) for the matrix A given above? Hint: You must consider all 4C2 possible values for As.
To be put up shortly
The assignment (including part b, which is to be put up) is due at the beginning of class on September 23rd. Each day of delay thereafter will automatically deduct 5% of the maimum points from your score.
Solutions may be emailed to Zhiding Yu, and must be cc-ed to Bhiksha. The message must have the subject line "MLSP assignment 1". It should include a report (1 page or longer) of what you did, and the resulting matrix as well as the synthesized audio.