15-463 Computational Photography

Final Project: Video Textures

Lindley French                lms@andrew.cmu.edu


The objective of this project was to enable the synthesis of arbitrary amounts of similar video given a small input video, with a minimum of visual discontinuity. This was accomplished by jumping and blending between frame sequences at optimal points.

I began by computing the sum squared error differences between every pair of frames in the input video. This was a time-consuming process, taking several minutes even when dealing with only about 100 frames of input.

I then modified the distance matrix by averaging costs of sequences of frames, thus preserving the dynamics of the input video; the video texture would be unlikely to make jumps which match frame-to-frame, but not in the type of motion occuring after this step.

At this point, there were two options available, depending on the type of video texture desired.

RANDOM PLAY

For a random play texture (generated on the fly), the next step was to do Q-learning on the matrix, to discourage the video texture from encountering "dead ends" (states from which no good transition exists) unexpectedly. This step uses an algorithm which always converges, but it can take several minutes to do so. Unfortunately, I cannot be certain I have implemented this portion correctly, since my results have been mixed if I incorperate this step.

After that, I compute the Markov transition matrix P, exponentially related to the distance matrix found above. Finally, I prune the transitions which are no local maxima in P, and renormalize all probabilities. I then simply step through the frames randomly as specified by the pruned matrix, loading images from the disk as needed.

Here is an example of a random play video texture. I input a set of frames fading in and out onto each member of the class (photos taken from project 2), and got the following matricies. The first is the simple distance, the second preserves dynamics, and the third is after Q-learning.

                    

Unfortunately, Q-learning did not appear to be beneficial in this special case. This is not terribly surprising, since the end of the sequence is not a "dead end".

         

Here is a sample of the results of random play on this input.

VIDEO LOOPS

Sometimes, a sequence does not lend itself well to random play for one reason or another. Usually, this is the case when the sequence contains significant nonrandom movement which repeats at regular intervals. In this case, we would prefer to find a video loop: a sequence of transitions which eventually return to the frame on which it started, and can be played on continuous loop in a media player. Since this method does not have to provide frames in real time, we can take the time to determine an optimal video loop, and smooth the transitions using cross-fading as well.

Determining an optimal video loop is a nontrivial task. First we must select a small number of transitions with low average cost; that is, D(i,j)/(i - j) for i > j. Unfortunately, we can only consider back-jumps with the current video loop algorithm; forward jumps i->j for i < j are not supported. The transitions we select are called primitive loops. We then run a dynamic programming algorithm to find the best compound loop, or combination of primitive loops, for each total loop length. If L is the maximum loop length being considered, and N is the number of primitive loops we have selected, then the running time for this algorithm is O(L2N2). This can take some time, if we do not select L and N appropriately small.

Once the table is filled in, we walk through it and find the loop such that cost/length is minimized. We then need to sequence the primitive loops of this compound loop. Due to the way in which the table was filled in, this is always possible to do. In my algorithm, I simply started at the primitive loop with the "jump point" closest to the end of the sequence, and then proceeded forward from the "land point", taking (and then removing from consideration) each subsequent jump as it was encountered.

In both of the examples I was able to prepare, the optimal compound loop turned out to itself be a primitive loop. This is unfortunate, but it turns out that finding good inputs for this algorithm is somewhat difficult.

Once a sequence of frames has been determined for the loop, we can augment it by cross-fading at transition points. This can introduce blur, but frequently looks better than simple jump-cuts.

Here are the D and D2 (dynamics preserved) matricies of the first example, a rotating ball:

         

And the movie computed from them (play on loop).

Here are the D and D2 matricies of the second example, a snow globe:

         

And the movie computed from them (play on loop).

Here are the D, D2, and D3 matricies of the third example, a burning candle. I included a D3 matrix because this video texture is better suited to random play. However, I have also included a video loop below.

                   

Here are the corresponding P and P2 (pruned) matrices for random play. I used a high sigma (30) when computing P from D3, to encourage random jumps.

         

And here is the promised video loop.