[SOLUTIONS] Homework 3 - Due: Tuesday April 24, 1:30pm, in class.
Apply DFT to following datasets.
One strong frequency (but notice the symmetry/reflection over
.5)
520.7175
5.3321e+005
C.f. Parseval's
theorem:
"where X[k] is the DFT of x[n], both of length N."
Note that Matlab does not do the normalization by N for you, so you
must normalize the numbers Matlab gives you in order to see
Parseval's equivalence:
sum of squares of the time values = (1/# samples) * sum of squares
of the amplitudes
in this case: 520.7175 = 1/1024 * 5.3321e+005
One strong frequency, and another strong, but weaker one.
1.7409e+004
1.7827e+007
No strong frequency stands out -- just noise
350.7012
3.5912e+005
Apply DWT using Haar wavelets (code available here) and DFT to this simulated heart beat.
What to hand in:
There seem to be a mixture of a few strong, regular frequencies of
varying strength.
The top line of each plot shows the original signal. This plot
shows the approximate coefficients for each scale 1 through 9:
While this plot again shows the original signal on top, followed by
the detail coefficients for scales 1 through 9:
There are a few dominant frequencies, but no real spikes.
If we don't count the symmetric coefficients twice, we can exactly
recreate the signal using only 5 dft coefficients, and thus have
zero RMSE.
Wavelets are not as good at compressing this kind of periodic
signal, and so we get a higher RMSE of .2165 (though the difference
is hard to see in the plot). The exact value will depend on which
coefficients you chose, but should be higher than the value for the
DFT reconstruction.
Again, your exact image will depend on the coefficients you used.
One common mistake here was to only do the transform on one
dimension of the iamge (x-, or y-axis) leaving you with a
reconstruction that looked "stripey" instead of "blocky."
Using this formula:
and the top 3 real coefficients, the RMSE is .2288. Again, your
RMSE may vary (depending on if you used a different DFT formula, or
a different number of DOF), but should be significantly higher than
the one you get using DWT (since, for this signal, wavelets are
more suited for compression than fourier). In general, nicely
periodic signals will be compressed more efficiently by DFT, while
signals with big discontinuities will do better with DWT.
Your results may vary, but the reconstruction using dwt should look
better than dft.
Using the above formula and reconstruction, the RMSE is .1849.
Again, your numbers may vary, but RMSE should be lower than with
DFT (even zero, depending on which coefficients you chose).
The fractal dimension for iterated function systems can be
calculated as
fractal dimension = log(# resulting shapes after
split)/log(magnification)
For this shape, each single line becomes two lines, so the
numerator is 2, and each resulting line is the hypotenuse of a
right triangle of base 1/2, so the magnification is sqrt(2), so we
get
fractal dimension = log(2)/log(sqrt(2)), so the fractal dimension
is 2.
Code may vary, but in general should produce output like this
Depends on your sample, but should look something like this.
Using the formula from before, we have:
log(5)/log(3) ~ 1.465
Depends on your sample, but should be near the exapected value of
log(5)/log(3)
Modify your transformation matrix to look something like:
double a[]= {0.333, 0.333, 0.333, 0.333, 0.333}; double b[]= {0.0, 0.0, 0.0, 0.0, 0.0}; double c[]= {0.0, 0.0, 0.0, 0.0, 0.0}; double d[]= {0.333, 0.333, 0.333, 0.333, 0.333}; double e[]= {0.0, 0.667, 0.333, 0.0, 0.667 }; double f[]= {0.0, 0.0, 0.333, 0.667, 0.667 }; double p[]= {0.2, 0.2, 0.2, 0.2, 0.2};
Consider the following Document by Term matrix: DocTerm
Hand in:
The key thing to notice here is that all but the first three
entries in the lambda matrix are effectively zero:
lambda = [ 42.4 0.0 0.0 0.0 21.2 0.0 0.0 0.0 7.1]
Based on the above analysis of the lambda matrix, estimate three
topics.
The key here was to recognize that each document only had weight in
one of the three topics. The clusters were:
Topic 1 contains documents: 2 3 4 5 7 9 10 11 12 16 19 22 23 25 26 30 31 33 34 35 36 37 38 39 41 42 44 48 50 51 52 53 55 56 57 59 60 62 63 68 69 70 71 72 73 74 75 76 78 79 80 81 82 83 88 90 91 94 96 99 Topic 2 contains documents: 1 8 13 15 17 18 21 27 28 29 32 43 45 47 49 54 58 61 65 66 67 77 85 87 89 92 93 95 97 98 Topic 3 contains documents: 6 14 20 24 40 46 64 84 86 100
The key was to realize the translation could be undone by first
subtracting the mean of each dimension away from each point, thus
recentering the points around the origin. You could then perform
SVD to recover the scales of: 21.5, 7.7, and 2.2. Exact results may
differ depending on how you do normalization.
What to hand in:
W1 vs time:
W2 vs time:
W1 vs W2 (across time):
W1 vs time:
W2 vs time:
W1 vs W2 (across time):
W1 vs time:
W2 vs time:
W1 vs W2 (across time):
W1 vs time:
W2 vs time:
W1 vs W2 (across time):
The changes begin near week 375. This is most clear in the plots
with forgetting.
The exact running times will depend on your machine and
implementation details, but in general your OLS code should run at
least an order of magnitude more SLOWLY than RLS. The main idea is
to see that RLS can do almost as well as RLS in significantly
shorter time.
For example, on an Intel Dual Core 2Ghz machine running Windows XP,
with 2 gb of ram:
OLS takes 7.093 seconds
while RLS takes only .078 seconds on the same machine (~100 times
faster)
RLS with and without forgetting: as in the
paper
OLS without forgettig: Essentially, you do ordinary least squares
regression on subsets of the full dataset:
Input: full data X and Y
For each t in 0 to T
X_t = X[0...t]
Y_t = Y[0...t]
W_t = inv(X_t’*X_t)*(X_t’)*Y_t;
end
OLS with forgettig: Same as above, but this time you discount older
observations (closer to t=0) with exponential decay alpha, as in
RLS. This makes mistakes on older observations less costly:
Input: full data X and Y and forgetting factor alpha For each t in 0 to T X_t = X[0...t] Y_t = Y[0...t] For each t_alpha in 0 to t L[t_alpha] = alpha^(t-t_alpha) end for W_t = inv((X_t’)*L*X_t)*(X_t’)*L*Y_t; end for