\( \def\v{\mathbf{v}} \def\w{\mathbf{w}} \def\x{\mathbf{x}} \def\D{\mathbf{D}} \def\V{\mathbf{V}} \def\S{\mathbf{S}} \def\F{\mathcal F} \def\bold#1{\bf #1} \)

 

 

MLSP Fall 2016: Homework 3
Expectation Maximization

Part I: EM and Shift-Invariant Models

In this problem we will consider shift-invariant mixtures of multi-variate multinomial distributions.

Consider data that have multiple discrete attributes. "Discrete" attributes are attributes that can take only one of a countable set of values. We will consider discrete attributes of a particular kind -- integers that have not only a natural rank ordering, but also a definite notion of distance.

Let $(X,Y)$ be the pair of discrete attributes defining any data instance. Since both $X$ and $Y$ are discrete, the probability distribution of $(X,Y)$ is a bi-variate multinomial.

We describe $(X,Y)$ as the outcome of generation by the following process:

The process has at its disposal several urns. Each urn has two sub-urns inside it. The first sub-urn represents a bi-variate multinomial: it contains balls, such that each ball has an $(X_1,Y_1)$ value marked on it. The second sub-urn represents a uni-variate multinomial -- it contains balls, such that each ball has a $X_2$ value marked on it.

In the following explanation we will use the notation $P_x(X)$ to indicate the probability that the Random Variable $x$ takes the value $X$.

We represent the content of the larger sub-urn within each urn as $(x_1, y_1)$. The smaller sub-urn generates the random variable $x_2$.

Drawing procedure: At each draw the drawing process performs the following operations.

Thus, the final observation is:

$(X,Y) = (X_1 + X_2, Y_1)$.

Representing the output random variable as $(x,y)$, the probability that it takes a value $(X,Y)$ is given by $P_{x,y}(X,Y)$.

Problem 1.1

Give the expression for $P_{x,y}(X,Y)$ in terms of $P_z(Z)$, $P_{x_1,y_1}(X_1,Y_1|Z)$ and $P_{x_2}(X_2|Z)$.

Problem 1.2

You are given a histogram of counts $H(X,Y)$ obtained from a large number of observations. $H(X,Y)$ represents the number of times $(X,Y)$ was observed. Give the EM update rules to estimate $P_z(Z)$, $P_{x_1,y_1}(X_1,Y_1|Z)$ and $P_{x_2}(X_2|Z)$.

Problem part 1.3

In this problem we will try to deblur a picture that has become blurry due to a slight left-to-right shake of the camera. You can download the actual picture from this link:

We model the picture as a histogram (the value of any pixel at a position $(X,Y)$, which ranges from 0-255, is viewed as the count of ``light elements'' at that position). We model this distribution as a shift-invariant mixture of one component (i.e. one large urn).

Assuming a very slight 20-pixel strictly-horizontal shake, we model that within the $X_2$ sub-urn $X_2$ can take integer values 0-19 (i.e. 20 wide). The $X_1$ value in the $(X_1,Y_1)$ sub-urn can range from 0 to (width-of-picture - 20). $Y_1$ can take values in the range 0 to (hieght-of-picture - 1).

Estimate and plot $P_{x_2}(X_2)$ and $P_{x_1,y_1}(X_1, Y_1)$. You will need the solution to problem 2 for this problem. If the solution to problem 2 is incorrect, the solution of problem 3 will not be considered or given any points.


Part II: Predicting the Election

In this problem we will try to track a number of opinion polls and try to estimate the true support for the candidates in a recent election.

The election is between four candidates. Public sentiment about the candidates fluctuates all the time. A number of opinion polls try to gauge public sentiment. However, since opinion polls are fundamentally noisy procedures (affected by factors such as the specific subset of people they poll, or the number of samples in their poll), each of them can be viewed as a noisy measurement of the true public sentiment. We will try to obtain a better estimate of the true sentiment, as well as the uncertainty of the estimate (which a pollster could use to establish a margin of error).

We will model the polls as the output of a linear Gaussian process as follows:

Our objective is to use the measurements $O_{0:t}$ (i.e. all measurements from time 0 to $t$) to estimate the true sentiment $S_t$ at time $t$.

Problem 2.1

Write out the Kalman filtering equations to estimate $S_t$ at each time $t$.

Problem 2.2

Implement the Kalman filter (you must submit the code). Run it on the provided data series (which only comprises the sequence of observations $O_0,\cdots,O_T$) and predict the true $S_t$ at each $t$. Plot the estimated $S_t$ as a function of time (this will be a single plot with 3 curves). Submit both the plot, and the the estimated state at every time. Also submit the final state uncertainty (i.e. the variance matrix of the state).

Problem 2.3

You wont be scored on this, but compare the final estimate (at the final instant) with the true voting percentages in the 2016 presidential election. You can get this from the RealClearPolitics webpage on 25th November or later for the final count (or close to it).

Data for the problem

The data here includes the following:

N.B: Please remember that this is only a homework problem and may not in any way be indicative of reality. Our model is unrealistic -- its unlikely that either the noise nor the innovation is Gaussian. We're also not explicitly handling other factors that affect the polling, or the constraint that the samples are strictly non-negative (you can't have a negative percent of the population voting for anyone). Various other factors are being ignored (although, in principle, all of these could be included in the model). Nonetheless, we believe the computational exercise itself is interesting and should tell you something of the power of MLSP techniques.


Due date

The assignment is due on 30 Nov 2016. The solutions must be emailed to Bhiksha, Chiyu and Anurag. Please use the format given here for your submissions.