$ \def\xx{\mathbf x} \def\xX{\mathbf X} \def\yy{\mathbf y} \def\bold#1{\bf #1} $

MLSP Fall 2016: Homework 1
Linear Algebra Refresher

Problem 1: Linear Algebra

Let's warm up with a simple problem.

1a. Rotational Matrices:

A rotation in 4-D space (whose coordinates we will call $X$, $Y$, $Z$ and $W$)is characterized by three angles. We will characterize them as a rotation along the $X-Y$ plane, a rotation along the $Y-Z$ plane, and a rotation along the $Z-W$ plane. Derive the rotation matrix $R$ that transforms a vector $[x, y, z, w]^\top$ to a new vector $[\hat{x}, \hat{y}, \hat{z}, \hat{w}]^\top$ by rotating it counterclockwize by angle $\theta$ along the $X-Y$ plane, an angle $\delta$ along the $Y-Z$ plane, and an angle $\phi$ along the $Z-W$ plane.
Confirm that $RR^\top = I$.

2b. Lengths of vectors.

The following matrix transforms 5-dimensional vectors into 4-dimensional ones: \[ A = \begin{bmatrix} 1 & 1 & 2 & 3 & 4 \\ 2 & 3 & 4 & 5 & 7 \\ 2 & 1 & 5 & 6 & 11 \\ 4 & 7 & 9 & 8 & 15 \\ \end{bmatrix} \]
1. A $5\times 1$ vector $v$ of length 1 is transformed by $A$ as $u = Av$. What is the longest length that $u$ can be?
2. What is the shortest non-zero length of $u$?
3. What is the rank of $A$? The null-space of $A$ is the space of all vectors $v$ such that $Av = 0$.
4. What is the dimensionality of the null-space of $A$?
Answer the above four questions for the following matrix. Here $B$ transforms 4-D vectors to 5-D. Note that $B = A^\top$. \[ B = \begin{bmatrix} 1 & 2 & 2 & 4 \\ 1 & 3 & 1 & 7 \\ 2 & 4 & 5 & 9 \\ 3 & 5 & 6 & 8 \\ 4 & 7 & 11 & 15 \\ \end{bmatrix} \]
The “Restricted Isometry Property” (RIP) constant of a matrix characterizes the change in length of vectors transformed by sub-matrices of the matrix. For our matrix $A$, let $A_s$ be a matrix formed of any $s$ columns of $A$. If $A$ is $M \times N$, $A_s$ will be $M\times s$. We can form $A_s$ in $^NC_s$ ways from the $N$ columns of $A$ (we assume that the order of vectors in $A_s$ is immaterial). Let $w$ be an $s \times 1$ vector of length 1. Let $l_{max}$ be the longest vector that one can obtain by transforming $w$ by any $A_s$. Let $l_{min}$ be the shortest vector obtained by transforming $w$ by any $A_s$. The RIP-$s$ constant $\delta_s$ of the matrix $A$ is defined as:
\[ \delta_s = \frac{l_{max} - l_{min}}{l_{max} + l_{min}} \]
1. What is $\delta_2$ (i.e. $\delta_s$ for $s = 2$) for the matrix $A$ given above? Hint: You must consider all $^5C_2$ possible values for $A_s$.
2. What is $\delta_3$?

Problem 2: Projections

[This] is a recording of "Polyushka Polye", played on the harmonica. It has been downloaded from YouTubewith permission from the artist

Here are a set of notes from a harmonica. You are required to transcribe the music. For transcription you must determine how each of the notes is played to compose the music.

You can use the matlab instructions given here to convert each note into a spectral vector, and the entire music to a spectrogram matrix.

Analysis by individual note:
1. For each note individually, compute the contribution of that note to the entire music. Mathematically, if $N_i$ is the vector representing the $i^{\rm th}$ note, and ${\mathbf M}$ the music matrix, find the row vector $W_i$ such that $N_i W_i \approx {\mathbf M}$. Return the transription of each note.
2. Recompose the music by “playing” each note according to the transcription you just found . Mathematically, compute $\widehat{\mathbf {M}} = \sum_i N_i W_i$ and invert the result to produce music. To invert it to a music signal follow the instructions given in the matlab notes. Return the recomposed music. Comment about how the recomposed music compares to the original signal.
Joint analysis using all notes:

Problem 3: Optimization and non-negative decomposition

Simple projection of music magnitude spectrograms (which are non-negative) onto a set of notes will result in negative weights for some notes. To explain, let $\mathbf{M}$ be the (magnitude) spectrogram of the music. It is a matrix of size $D \times T$, where $D$ is the size of the Fourier transform and $T$ is the number of spectral vectors in the signal. Let $\mathbf{N}$ be a matrix of notes. Each column of $\mathbf{N}$ is the magnitude spectral vector for one note. $\mathbf{N}$ has size $D \times K$, where $K$ is the number of notes.

Conventional projection of $\mathbf{M}$ onto the notes $\mathbf{N}$ computes the approximation \[ \widehat{\mathbf{M}} = \mathbf{N} \mathbf{W} \]

such that $||\mathbf{M} - \widehat{\mathbf{M}}||_F^2 = \sum_{i,j} (M_{i,j} - \widehat{M}_{i,j})^2$ is minimized. Here $||\mathbf{M} - \widehat{\mathbf{M}}||_F$ is known as the Frobenius norm of $\mathbf{M} - \widehat{\mathbf{M}}$. $M_{i,j}$ is the $(i,j)^{\rm th}$ entry of $\mathbf{M}$ and $\widehat{M}_{i,j}$ is similarly the $(i,j)^{\rm th}$ entry of $\widehat{\mathbf{M}}$. Please note the definition of the Frobenius norm; we will use it later.

$\widehat{\mathbf{M}}$ is the projection of $\mathbf{M}$ onto $\mathbf{N}$. $\mathbf{W}$, of course, is given by $\mathbf{W} = pinv(\mathbf{N}) \mathbf{M}$. $\mathbf{W}$ can be viewed as the transcription of $\mathbf{M}$ in terms of the notes in $\mathbf{N}$. So, the $j^{\rm th}$ column of $\mathbf{M}$, which we represent as $M_j$ and is the spectrum in the $j^{\rm th}$ frame of the music, is approximated by the notes in $\mathbf{N}$ as \[ M_j = \sum_i N_i W_{i,j} \]

where $N_i$, the $i^{\rm th}$ column of $\mathbf{N}$ and represents the $i^{\rm th}$ note and $W_{i,j}$ is the weight assigned to the $i^{\rm th}$ note in composing the $j^{\rm th}$ frame of the music.

Whe problem is that in this computation, we will frequently find $W_{i,j}$ values to be negative. In other words, this model requires you to subtract some notes — since $W_{i,j} N_i$ will have negative entries if $W_{i,j}$ is negative, this is equivalent to subtracting note the weighted note $|W_{i,j}|N_i$ in the $j^{\rm th}$ frame. Clearly, this is an unreasonable operation intuitively; when we actually play music, we never unplay a note (which is what playing a negative note would be).

Also, $\widehat{\mathbf{M}}$ may have negative entries. In other words, our projection of $\mathbf{M}$ onto the notes in $\mathbf{N}$ can result in negative spectral magnitudes in some frequencies at certain times. Again, this is meaningless physically -- spectral magnitudes cannot, by definition, be negative.

In this homework problem we will try to fix this anomaly.

We will do this by computing the approximation $\widehat{\mathbf{M}} = \mathbf{N} \mathbf{W}$ with the constraint that the entries of $\mathbf{W}$ must always be greater than or equal to $0$, i.e. they must be non-negative. To do so we will use a simple gradient descent algorithm which minimizes the error $||\mathbf{M} - \mathbf{N}\mathbf{W}||_F^2$ subject to the constraint that all entries in $\mathbf{W}$ are non-negative.

Computing a derivative
We define the following error function: \[ E = \frac{1}{DT}||\mathbf{M} - \mathbf{N}\mathbf{W}||_F^2. \] where $D$ is the number of dimensions (rows) in $\mathbf{M}$, and $T$ is the number of vectors (frames) in $\mathbf{M}$.
Derive the formula for $\frac{dE}{d\mathbf{W}}$.
A Non-Negative Projection
We define the following gradient descent rule to estimate $\mathbf{W}$. It is an iterative estimate. Let $\mathbf{W}^0$ be the initial estimate of $\mathbf{W}$ and $\mathbf{W}^n$ the estimate after $n$ iterations.
We use the following projected gradient update rule \[ \widehat{\mathbf{W}}^{n+1} = \mathbf{W}^n - \eta \frac{dE}{d\mathbf{W}}|_{\mathbf{W}^n} \\ \mathbf{W}^{n+1} = \max(\widehat{\mathbf{W}}^{n+1},0) \]
where $\frac{dE}{d\mathbf{W}}|_{\mathbf{W}^n}$ is the derivative of $E$ with respect to $\mathbf{W}$ computed at $\mathbf{W} = \mathbf{W}^n$, and $\max(\widehat{\mathbf{W}}^{n+1},0)$ is a component-wise flooring operation that sets all negative entries in $\widehat{\mathbf{W}}^{n+1}$ to 0.
In effect, our feasible set for values of $\mathbf{W}$ are $\mathbf{W} \succcurlyeq 0$, where the symbol $\succcurlyeq$ indicates that every element of $\mathbf{W}$ must be greater than or equal to 0. The algorithm performs a conventional gradient descent update, and projects any solutions that fall outside the feasible set back onto the feasible set, through the max operation.
Implement the above algorithm. Initialize $\mathbf{W}$ to a matrix of all $1$s. Run the algorithm for $\eta$ values $(0.0001, 0.001, 0.01, 0.1)$. Run 250 iterations in each case. Plot $E$ as a function of iteration number $n$. Return this plot and the final matrix $\mathbf{W}$. Also show a plot of best error $E$ as a function of $\eta$.
Recreating the music (No points for this one).
For the best $\eta$ (which resulted in the lowest error) recreate the music using this transcription as $\widehat{\mathbf{M}} = \mathbf{N} {\mathbf{W}}$. Resynthesize the music from $\widehat{\mathbf M}$. What does it sound like? You may return the resynthesized music to impress us (although we won't score you on it).

How to submit

Detailed instuctions on how to submit the results are given here.

Solutions must be emailed to both TAs, and cc-ed to Bhiksha. The message must have the subject line "MLSP assignment 1". Remember to include your generated results. Don't delete them.

Solutions are due before Oct 4th, 2016 (i.e. by 23:59:59 on Oct 3rd).