15-463 Computational
Photography
Final Project: Video
Textures
Lindley French
lms@andrew.cmu.edu
The
objective of this project was to enable the synthesis of arbitrary
amounts of similar video given a small input video, with a minimum of
visual discontinuity. This was accomplished by jumping and blending
between frame sequences at optimal points.
I began by computing the sum squared error differences between every
pair of frames in the input video. This was a time-consuming process,
taking several minutes even when dealing with only about 100 frames of
input.
I then modified the distance matrix by averaging costs of sequences of
frames, thus preserving the dynamics of the input video; the video
texture would be unlikely to make jumps which match frame-to-frame, but
not in the type of motion occuring after this step.
At this point, there were two options available, depending on the type
of video texture desired.
RANDOM PLAY
For a random play texture (generated on the fly), the next step was to
do Q-learning on the matrix, to discourage the video texture from
encountering "dead ends" (states from which no good transition exists)
unexpectedly. This step uses an algorithm which always converges, but
it can take several minutes to do so. Unfortunately, I cannot be
certain I have implemented this portion correctly, since my results
have been mixed if I incorperate this step.
After that, I compute the Markov transition matrix P, exponentially
related to the distance matrix found above. Finally, I prune the
transitions which are no local maxima in P, and renormalize all
probabilities. I then simply step through the frames randomly as
specified by the pruned matrix, loading images from the disk as needed.
Here is an example of a random play video texture. I input a set of
frames fading in and out onto each member of the class (photos taken
from project 2), and got the following matricies. The first is the
simple distance, the second preserves dynamics, and the third is after
Q-learning.
Unfortunately, Q-learning did not appear to be beneficial in this
special case. This is not terribly surprising, since the end of the
sequence is not a "dead end".
Here is a sample of the results of random play
on this input.
VIDEO LOOPS
Sometimes, a sequence does not lend itself well to random play for one
reason or another. Usually, this is the case when the sequence contains
significant nonrandom movement which repeats at regular intervals. In
this case, we would prefer to find a video loop: a sequence of
transitions which eventually return to the frame on which it started,
and can be played on continuous loop in a media player. Since this
method does not have to provide frames in real time, we can take the
time to determine an optimal video loop, and smooth the transitions
using cross-fading as well.
Determining an optimal video loop is a nontrivial task. First we must
select a small number of transitions with low average cost; that is,
D(i,j)/(i - j) for i > j. Unfortunately, we can only consider
back-jumps with the current video loop algorithm; forward jumps i->j
for i < j are not supported. The transitions we select are called primitive loops. We then run a
dynamic programming algorithm to find the best compound loop, or combination of
primitive loops, for each total loop length. If L is the maximum loop
length being considered, and N is the number of primitive loops we have
selected, then the running time for this algorithm is O(L2N2).
This can take some time, if we do not select L and N appropriately
small.
Once the table is filled in, we walk through it and find the loop such
that cost/length is minimized. We then need to sequence the primitive
loops of this compound loop. Due to the way in which the table was
filled in, this is always possible to do. In my algorithm, I simply
started at the primitive loop with the "jump point" closest to the end
of the sequence, and then proceeded forward from the "land point",
taking (and then removing from consideration) each subsequent jump as
it was encountered.
In both of the examples I was able to prepare, the optimal compound
loop turned out to itself be a primitive loop. This is unfortunate, but
it turns out that finding good inputs for this algorithm is somewhat
difficult.
Once a sequence of frames has been determined for the loop, we can
augment it by cross-fading at transition points. This can introduce
blur, but frequently looks better than simple jump-cuts.
Here are the D and D2 (dynamics preserved) matricies of the first
example, a rotating ball:
And the movie computed from them (play on loop).
Here
are the D and D2 matricies of the second example, a snow globe:
And the movie computed from them (play on
loop).
Here are the D, D2, and D3 matricies of the third example, a burning
candle. I included a D3 matrix because this video texture is better
suited to random play. However, I have also included a video loop below.
Here are the corresponding P and P2 (pruned) matrices for random play.
I used a high sigma (30) when computing P from D3, to encourage random
jumps.
And here is the promised video loop.