MLSP Fall 2015: Homework 1
Linear Algebra Refresher

Problem 1: Music Transcription (Zhiding)

[This] is a recording of "Polyushka Polye", played on the harmonica. It has been downloaded with permission from the artist

Below are a set of notes from a harmonica. You are required to transcribe the music. For transcription you must determine the note or set of notes being played at each instant.

Note Wav file
E e.wav
F f.wav
G g.wav
A a.wav
B b.wav
C c.wav
D d.wav
E2 e2.wav
F2 f2.wav
G2 g2.wav
A2 a2.wav

To transcribe the music you must obtain a matrix of 1s and 0s, where rows correspond to notes and columns correspond to time. A note-time $(n,t)$ entry in the matrix will be 1 if the correspnding note $n$ is present at time $t$.

To do so, you must determine the weights with which the notes must be combined to obtain the closest approximation to the music at that time. The matrix must be assigned an entry of 1 for each of the notes which have a weight greater than 0.

Matlab  Instructions

Download the following matlab files: [stft.m]

You can read a wav file into matlab as follows:

[s,fs] = wavread('filename');
s = resample(s,16000,fs);

The recordings of the notes can be computed to a spectrum as follows:

spectrum = mean(abs(stft(s',2048,256,0,hann(2048))),2);

'spectrum' will be a $1025 \times 1$

The recordings of the complete music can be read just as you read the notes. To convert it to a spectrogram, do the following:

sft = stft(s',2048,256,0,hann(2048));
sphase = sft./(abs(sft)+eps);
smag = abs(sft);

The 'eps' used here is a very tiny Matlab value to prevent sphase from getting 'NAN' at some locations. The 'smag' will be a $1025\times K$ matrix where $K$ is the number of spectral vectors in the matrix. We will also need 'sphase' to reconstruct the signal later.

Compute the spectrum for each of the notes. Compute the spectrogram matrix 'smag' for the music signal. This matrix is composed of $K$ spectral vectors. Each vector represents 16 milliseconds of the signal.

The output should be in the form of a matrix:
1 1 0 0 0 0 0 1 ...
0 0 0 1 1 0 1 1 ...
0 1 1 1 0 1 1 1 ...
...........................

Each row of the matrix represents one note. Hence there will be as many rows as you have notes in table 1.

Each column represents one of the columns in the spectrogram for the music. So if there are $K$ vectors in the spectrogram, there will be $K$ vectors in your output.

Each entry will denote if a note was found in that vector or not. For instance, if matrix entry $(4,25) = 0$, then the fourth note (d) was not found in the 25th spectral vector of the signal.

Synthesizing  Audio

This is not for credit. However you may find it instructive in understanding the results from part 3 of this homework.

You can use the notes and the transcription matrix thus obtained to synthesize audio. Note that matrix multiplying the notes and the transcription will simply give you the magnitude spectrum. In order to create meaningful audio, you will need to use the phases as well. Once you have the phases included, you can use the stft to synthesize a signal from the matrix. Submit the synthesized audio along with the matrix.

Deliverables, Formats and Bonus Question

You should strictly follow the instructions and the given Matlab template to submit your code and results. The organization of submission files as well the structure of code are NOT meant to be changed. Use the given variable names and function arguments rather than changing them. Don't move the data files to other places. When storage of results is required, store your results of each problem in the 'results' folder in each problem's folder. Do name your root folder as well as report with your Andrew ID.

In particular, here are what we expected for Problem 1:

1. Fill the function ''load_data.m" with your code to load the notes and the music. Output their (mean) spectrum magnitudes and phases as instructed in the code.

2. Fill the function "transcribe_music.m" with your code to compute the following 4 matrices: 1. transcribe matrix; 2. thresholded transcribe matrix; 3. projection matrix; 4. reconstructed music spectrum magnitude with thresholded transcribe matrix. You may play with different threshold (T) values, but when submitting your code and results, use 0.1 as the threshold.

3. (Optional) Fill the function "synthesize_music.m" with your code to return a synthesized music. Although it's optional, we do strongly recommend you finish it, as it is very simple and only takes two lines of matlab code. Once you finish it, you will get a better idea how stft and basis projection works. If you have implemented your code correctly, you should generate a synthesized music that sounds reasonable.

4. Fill "Run_Problem1.m" to run it as the 'main' function calling other functions. You should: 1. Output your thresholded transcribed matrix as 'transMatT.mat' and save it to the 'results' folder in 'Problem1'. 2. (Optional) Output your synthesized music as 'polyushka_syn.wav' also to this folder.

5. (Bonus) Prove the following question in your report: Suppose $\mathbf{W}$ is the $1025\times 11$ note spectrum magnitude matrix, $\mathbf{x}_k$ is the $k$th column of the nonthresholded transcribed matrix and $\mathbf{m}_k$ is the music spectrum magnitude at time $k$, then your implementation of computing $\mathbf{x}_k$ in Question 2 is a minimizer of $\|\mathbf{W}\mathbf{x}_k - \mathbf{m}_k\|^2$. Then consider the projection matrix learned in the class, what's the connection? Now do you understand why the projection matrix you learned is formed in this way?

You do not need to paste your code into the report. But make sure you include all codes and the data files. Your submitted code should be runable with one click.


Problem 2: Linear Algebra (Zhiding)

Let's warm up with a simple problem.

2a. Rotational Matrices:

A rotation in 3-D space is characterized by two angles. We will characterize them as a rotation along the $X-Y$ plane, and a rotation along the $Y-Z$ plane. Derive the equations that transform a vector $[x, y, z]^\top$ to a new vector $[x', y', z']^\top$ by rotating it counterclockwize by angle $\theta$ along the $X-Y$ plane and by an angle $\delta$ along the $Y-Z$ plane. Represent this as a matrix transformation of the column vector $[x, y, z]^\top$ to the column vector $[x', y', z']^\top$. The matrix that transforms the former into the latter is a rotation matrix.

2b. Lengths of vectors.

  1. The following matrix transforms 4-dimensional vectors into 3-dimensional ones:   \[ A = \begin{bmatrix} 1 & 2 & 3 & 4 \\ 3 & 4 & 5 & 7 \\ 5 & 7 & 9 & 11 \end{bmatrix} \]

    A $4\times 1$ vector $v$ of length (l-2 norm) 4 is transformed by $A$ as $u = Av$. What is the longest length (l-2 norm) that $u$ can be? What is the shortest length of $u$?

  2. Repeat the above for the following matrix. Here $B$ transforms 3-D vectors to 4-D. What is the maximum length of a the transformed version of a 3-D vector of length 1. What is the minimum length? \[ B = \begin{bmatrix} 1 & 3 & 5 \\ 2 & 4 & 7 \\ 3 & 5 & 9 \\ 4 & 7 & 11 \end{bmatrix} \]
  3. The “Restricted Isometry Property” (RIP) constant of a matrix characterizes the change in length of vectors transformed by sub-matrices of the matrix. For our matrix $A$, let $A_s$ be a matrix formed of any $s$ columns of $A$. If $A$ is $M \times N$, $A_s$ will be $M\times s$. We can form $A_s$ in $^NC_s$ ways from the $N$ columns of $A$ (we assume that the order of vectors in $A_s$ is immaterial). Let $w$ be an $s \times 1$ vector of length 1. Let $l_{max}$ be the longest vector that one can obtain by transforming $w$ by any $A_s$. Let $l_{min}$ be the shortest vector obtained by transforming $w$ by any $A_s$. The RIP-$s$ constant $\delta_s$ of the matrix $A$ is defined as:
    \[ \delta_s = \frac{l_{max} - l_{min}}{l_{max} + l_{min}} \] What is $\delta_2$ (i.e. $\delta_s$ for $s = 2$) for the matrix $A$ given above? Hint: You must consider all $^4C_2$ possible values for $A_s$.

Deliverables

1. You must prove the three questions algebraically in your report and show detailed steps.

2. It is optional to include simulation codes in your submitted files. However you must present your simulation results in your report.


Problem 3. Revisiting Music Transcription (Bing)

3a. Matrix Calculus

We warm up for problem 3 with basic matrix calculus.

We begin by stating a few rules we will use. Everywhere the notation $x_i$ refers to the $i^{\rm th}$ component of a vector ${\mathbf x}$ and $x_{i,j}$ refers to the $(i,j)^{\rm th}$ component of a matrix ${\mathbf X}$.

  1. The derivative of a scalar $z$ with respect to an $N \times 1$ vector ${\mathbf x}$ is an $N \times 1$ vector. The $i^{\rm th}$ component of this vector is $\frac{dz}{dx_i}$.
  2. The derivative of a scalar $z$ with respect to an $N \times M$ matrix ${\mathbf X}$ is an $N \times M$ matrix, whose $(i,j)^{\rm th}$ component is $\frac{dz}{dx_{i,j}}$.
  3. The derivative of an $N \times 1$ vector ${\mathbf y}$ with respect to an $M \times 1$ vector ${\mathbf x}$ is an $N \times M$ matrix, whose $(i,j)^{\rm th}$ component is $\frac{dy_i}{dx_j}$.

Note the order of the indices in the following

  1. The derivative of an $N \times 1$ vector ${\mathbf y}$ with respect to an $M \times L$ matrix ${\mathbf X}$ is an $N \times L \times M$ tensor (note the reversal of the order of $L$ and $M$), whose $(i,j,k)^{\rm th}$ component is $\frac{dy_i}{dx_{k,j}}$ (again, note the reversal of the order of indices in the denominator -- to get the derivative in the $i,j,k$ location, we differentiate w.r.t to the variable in the $k,j$ location of ${\mathbf X}$.
  2. The derivative of an $N \times K$ matrix ${\mathbf Y}$ with respect to an $M \times L$ matrix ${\mathbf Y}$ is an $N \times K \times L \times M$ tensor, whose $(i,j,k,l)^{\rm th}$ component is $\frac{dy_{i,j}}{dx_{l,k}}$.
  3. The derivative of an $N_1 \times N_2 \times \cdots \times N_L$ tensor ${\mathbf Y}$ with respect to an $M_1\times M_2 \times \cdots \times M_K$ tensor is an $N_1 \times N_2 \times \cdots \times N_L \times M_K \times M_{K-1} \times \cdots \times M_1$ tensor.

The transpose of any $N_1 \times N_2$ matrix is an $N_2 \times N_1$ matrix. We will represent the transposition operation by the superscript $\top$. Let ${\mathbf Y} = {\mathbf X}^\top$. Then $y_{i,j} = x_{j,i}$ (i.e. the $(i,j)$-th component of ${\mathbf Y}$ is the $(j,i)$-th component of ${\mathbf X}$).

For the purposes of the computation here, we will expand the notion of transposes to tensors. Let ${\mathbf Y}$ be any $N_1 \times N_2 \times \cdots \times N_K$ tensor. By our definition ${\mathbf X} = {\mathbf Y}^\top$ is an $N_K \times N_{K-1} \times \cdots \times N_1$ tensor, whose components are given by $x_{i_1,i_2,\cdots,i_K} = y_{i_K,i_{K-1},\cdots,i_i}$.

Using the above definitions, we can also write the following chain rule.

  1. Let ${\mathbf X}$ be any matrix (noting that a vector is also a one-column matrix, so the same rule also applies to vectors, or tensors in general). Let $g(\bullet)$ and $h(\bullet)$ be two functions. The functions may have scalar, vector or tensor outputs. Let \[ y = g\left(h\left( {\mathbf X}\right)\right) \] Then \[ \frac{dy}{d{\mathbf X}} = \left(\frac{dh({\mathbf X})}{d{\mathbf X}}\right) ^{\top} \frac{d g}{dh} \] where $dg$ is shorthand for $d(g(h({\mathbf X})))$ and $dh$ stands for $d(h({\mathbf X}))$. Note the order of multplication.
  2. In general, if $f_1(\bullet)$, $f_2(\bullet)$, $f_3(\bullet) \cdots$ are functions, then if \[ y = f_1\left(f_2\left(f_3\left(\cdots f_K\left({\mathbf X}\right)\right)\right)\right) \] then \[ \frac{dy}{d{\mathbf X}} = \left(\frac{df_K({\mathbf X})}{d{\mathbf X}}\right)^\top \left( \frac{d f_{K-1}}{df_K}\right)^\top \left(\frac{d f_{K-2}}{df_{K-1}}\right)^\top \cdots \left(\frac{d f_2}{d f_1}\right)^\top \frac{d f_1}{df_2} \] Once again, note the order of computation.
Finally, we give the following rule for multiplying tensors.
  1. Let ${\mathbf X}$ be an $N \times M \times L$ tensor. It can only left multiply $L \times 1$ vectors. Let ${\mathbf y}$ be an $L \times 1$ vector. Then ${\mathbf Z} = {\mathbf Xy}$ is an $N\times M$ matrix whose components are given by \[ z_{i,j} = \sum_k x_{i,j,k}y_k \]
  2. Let ${\mathbf X}$ be an $N \times M \times L$ tensor. It can only left multiply $L \times M$ matrices. Let ${\mathbf Y}$ be an $L \times M$ matrix. Then ${\mathbf Z} = {\mathbf XY}$ is an $N\times 1$ vector whose components are given by \[ z_i = \sum_j\sum_k x_{i,j,k}y_{k,j} \]
We are now set to state our problems.

3.a.I.   Derivative of scalar function w.r.t. vector argument

Let ${\mathbf x}$ and ${\mathbf y}$ be $N\times 1$ vectors. Let \[ e = \| z - {\mathbf y}^\top{\mathbf x}\|^2 \] Show that \[ \frac{de}{d{\mathbf x}} = -2\left(z - {\mathbf y}^\top{\mathbf x}\right) {\mathbf y} \] The derivation is simple, based on rule number 1.

3.a.II.   Derivative of scalar function w.r.t. matrix argument

Let ${\mathbf X}$ be an $N\times M$ matrix, ${\mathbf z}$ be an $N \times 1$ vector, ${\mathbf y}$ be $M\times 1$ vectors. Let \[ e = \| {\mathbf z} - {\mathbf X}{\mathbf y}\|^2 \] Show that \[ \frac{de}{d{\mathbf X}} = -2\left({\mathbf z} - {\mathbf X}{\mathbf y}\right) {\mathbf y}^\top \]

HINT:

For part B., the chain rule gives us: \[ \frac{de}{d{\mathbf X}} = \left(\frac{d{\mathbf X}{\mathbf y}}{d{\mathbf X}}\right)^\top \frac {d \| {\mathbf z} - {\mathbf X}{\mathbf y}\|^2} {d{\mathbf X}{\mathbf y}} \] Note that ${\mathbf X}{\mathbf y}$ is an $N \times 1$ vector whose $i^{\rm th}$ component is given by $\sum_j x_{i,j}y_j$. Therefore, representing ${\mathbf u} = {\mathbf X}{\mathbf y}$, \[ \frac {d u_i}{d x_{j,k}} = \begin{cases} y_k~~~~if~~j=i\\ 0~~~~~otherwise \end{cases} \] Thus, although $\frac{d{\mathbf u}}{\mathbf X}$ is an $N \times M \times N$ tensor, only the entries along the diagonal plane $i=j$ are non-zero, and all other terms are zero. That is, if we let ${\mathbf V} = \frac{d{\mathbf u}}{{\mathbf X}}$, using rule 4 we get \[ v_{i,k,j} = \begin{cases} y_k~~~~if~~i=j\\ 0~~~~~otherwise \end{cases} \] The resulting Tensor has the structure shown below. Only the diagonal plane shown in the figure has non-zero values. All the remaining values are 0. Each row in the diagonal plane is the vector ${\mathbf y}^\top$.

tensor

Because of the zero-valued off diagonal elements, tensor transposition has no effect on the tensor. Multiplying this tensor with any $M\times 1$ vector ${\mathbf t}$ has the following geometric interpretation. The figure below illustrates the operations.

tensor

The depth dimension of the tensor, which is its trailing dimension, multiplies the vector. The highlighted points and the arrows indicate corresponding elements that are multiplied together. The products are summed.

Clearly, since only the diagonal elements have non-zero values, only one of the component-wise products has a non-zero value, and the sum takes this value.

This has the effect that for any $M \times 1$ vector ${\mathbf t}$, the matrix ${\mathbf S} = {\mathbf V}^\top{\mathbf t}$ is an $M \times N$ matrix whose components (according to rule 9) are given by \[ s_{k,j} = \sum_{i} v_{i,j,k}t_i = t_ky_j \] The indices can be verified by referring to the illustration above.

3b. Revising the Music

Simple projection of music magnitude spectrograms (which are non-negative) onto a set of notes will result in negative weights for some notes. To explain, let $\mathbf{M}$ be the (magnitude) spectrogram of the music. It is a matrix of size $D \times T$, where $D$ is the size of the Fourier transform and $T$ is the number of spectral vectors in the signal. Let $\mathbf{N}$ be a matrix of notes. Each column of $\mathbf{N}$ is the magnitude spectral vector for one note. $\mathbf{N}$ has size $D \times K$, where $K$ is the number of notes.

Conventional projection of $\mathbf{M}$ onto the notes $\mathbf{N}$ computes the approximation \[ \widehat{\mathbf{M}} = \mathbf{N} \mathbf{T} \]

such that $||\mathbf{M} - \widehat{\mathbf{M}}||_F^2 = \sum_{i,j} (M_{i,j} - \widehat{M}_{i,j})^2$ is minimized. Here $||\mathbf{M} - \widehat{\mathbf{M}}||_F$ is known as the Frobenius norm of $\mathbf{M} - \widehat{\mathbf{M}}$. $M_{i,j}$ is the $(i,j)^{\rm th}$ entry of $\mathbf{M}$ and $\widehat{M}_{i,j}$ is similarly the $(i,j)^{\rm th}$ entry of $\widehat{\mathbf{M}}$. Please note the definition of the Frobenius norm; we will use it later.

$\widehat{\mathbf{M}}$ is the projection of $\mathbf{M}$ onto $\mathbf{N}$. $\mathbf{T}$, of course, is given by $\mathbf{T} = pinv(\mathbf{N}) \mathbf{M}$. $\mathbf{T}$ can be viewed as the transcription of $\mathbf{M}$ in terms of the notes in $\mathbf{N}$. So, the $j^{\rm th}$ column of $\mathbf{M}$, which we represent as $M_j$ and is the spectrum in the $j^{\rm th}$ frame of the music, is approximated by the notes in $\mathbf{N}$ as \[ M_j = \sum_i N_i T_{i,j} \]

where $N_i$, the $i^{\rm th}$ column of $\mathbf{N}$ and represents the $i^{\rm th}$ note and $T_{i,j}$ is the weight assigned to the $i^{\rm th}$ note in composing the $j^{\rm th}$ frame of the music.

The problem is that in this computation, we will frequently find $T_{i,j}$ values to be negative. In other words, this model requires you to subtract some notes — since $T_{i,j} N_i$ will have negative entries if $T_{i,j}$ is negative, this is equivalent to subtracting note the weighted note $|T_{i,j}|N_i$ in the $j^{\rm th}$ frame. Clearly, this is an unreasonable operation intuitively — when we actually play music, we never unplay a note (which is what playing a negative note would be).

Also, $\widehat{\mathbf{M}}$ may have negative entries. In other words, our projection of $\mathbf{M}$ onto the notes in $\mathbf{N}$ can result in negative spectral magnitudes in some frequencies at certain times. Again, this is meaningless physically -- spectral magnitudes cannot, by definition, be negative.

In this homework problem we will try to fix this anomaly.

We will do this by computing the approximation $\widehat{\mathbf{M}} = \mathbf{N} \mathbf{T}$ with the constraint that the entries of $\mathbf{T}$ must always be greater than or equal to $0$, i.e. they must be non-negative. To do so we will use a simple gradient descent algorithm which minimizes the error $||\mathbf{M} - \mathbf{N}\mathbf{T}||_F^2$ subject to the constraint that all entries in $\mathbf{T}$ are non-negative.

3.b.I. Computing a derivative

We define the following error function: \[ E = ||\mathbf{M} - \mathbf{N}\mathbf{T}||_F^2. \]

Derive the formula for $\frac{dE}{d\mathbf{T}}$.

3.b.II. A Non-Negative Projection

We define the following gradient descent rule to estimate $\mathbf{T}$. It is an iterative estimate. Let $\mathbf{T}^0$ be the initial estimate of $\mathbf{T}$ and $\mathbf{T}^n$ the estimate after $n$ iterations.

We use the following update rule \[ \widehat{\mathbf{T}}^{n+1} = \mathbf{T}^n - \eta \frac{dE}{d\mathbf{T}}|_{\mathbf{T}^n} \\ \mathbf{T}^{n+1} = \max(\widehat{\mathbf{T}}^{n+1},0) \]

where $\frac{dE}{d\mathbf{T}}|_{\mathbf{T}^n}$ is the derivative of $E$ with respect to $\mathbf{T}$ computed at $\mathbf{T} = \mathbf{T}^n$, and $\max(\widehat{\mathbf{T}}^{n+1},0)$ is a component-wise flooring operation that sets all negative entries in $\widehat{\mathbf{T}}^{n+1}$ to 0.

3.b.II.i. Implement the above algorithm. Initialize $\mathbf{T}$ to a matrix of all $1$s. Set $\eta$ to 0.01. Run 100 iterations. Plot $E$ as a function of iteration number $n$. Return this plot and the final matrix $\mathbf{T}$. Revise the value of $\eta$ if $E$ did not converge. Also add the $E$ plot using your revised $\eta$ value in the report.

3.b.II.ii. No points for this one, but try this. Transcribe the music: set all values of $\mathbf{T}$ above the threshold 0.1 to 1.0 and the rest to 0, and return the resulting binary transcription. Let this matrix be $\mathbf{T}_b$ (the subscript $b$ representing binary). Now recreate the music using this transcription as $\mathbf{M}_b = \mathbf{N} {\mathbf{T}}_b$. You can regenerate audio from ${\mathbf M}_b$. What does it sound like? You may return the resynthesized music to impress us (although we won't score you on it).

 


Due Date and General Instructions

The assignment is due at the beginning of class on Oct 6th, 2015. The assignment is worth 15 points. Each day of delay thereafter will automatically deduct 3 points from your score. Everyone should strictly follow the instructions and templates, or you may lose considerable amount of hw scores.

Solutions must be emailed to both TAs, and cc-ed to Bhiksha. The message must have the subject line "MLSP assignment 1". Remember to include your generated results. Don't delete them.

The template for assignment 1 is available here: hw1_template.zip