In this problem we will consider shift-invariant mixtures of multi-variate multinomial distributions.
Consider data that have multiple discrete attributes. "Discrete" attributes are attributes that can take only one of a countable set of values. We will consider discrete attributes of a particular kind -- integers that have not only a natural rank ordering, but also a definite notion of distance.
Let (X,Y) be the pair of discrete attributes defining any data instance. Since both X and Y are discrete, the probability distribution of (X,Y) is a bi-variate multinomial.
We describe (X,Y) as the outcome of generation by the following process:
The process has at its disposal several urns. Each urn has three sub-urns inside it. The first sub-urn represents a bi-variate multinomial: it contains balls, such that each ball has an (X,Y) value marked on it. The second sub-urn represents a uni-variate multinomial -- it contains balls, such that each ball has a Y value marked on it. The third sub-urn too represents a uni-variate multinomial -- it contains balls, such that each ball has a X value marked on it.
Drawing procedure: At each draw the drawing process performs the following operations.
The final observation is:
Give the expression for P(X,Y) in terms of P(Z), P(X,Y|Z), P(X|Z), and P(Y|Z).
You are given a histogram of counts H(X,Y) obtained from a large number of observations. H(X,Y) represents the number of times (X,Y) was observed. Give the EM update rules to estimate P(Z), P(X,Y|Z), P(X|Z), and P(Y|Z).
represents a histogram (the value of any pixel at a position (X,Y), which ranges from 0-255, is viewed as the count of ``light elements'' at that position). We model this distribution as a shift-invariant mixture of 4 components (large urns). Specifically, we also assume that within each (X,Y) sub-urn X can take integer values 0-90, and Y can take values in 0-90. The X values in the X sub-urns can range from 0-(width-of-picture - 90), and Y values in the Y suburn can take values in the range 0-(heigth-of-picture-90).
Estimate and plot P(X,Y|Z). You will need the solution to part 2 for this problem. If the solution to part 2 is incorrect, the solution of part 3 will not be considered or given any points.
The solutions must be emailed to me, Anoop and Manuel. Please use "MLSP Homework 3" as the subject line.