Stereo Matching with Linear Superposition of Layers

Yanghai Tsin, Sing Bing Kang and Richard Szeliski

Problem

Given a sequence of images with known camera motion, how do we separate effects due to linear superposition of layers for arbitrarily shaped objects? Examples: 

Reflection example (MPEG)

Translucency example (MPEG)

The setup for the doll example is shown below:

 

Our approach

We use a variant of the plane sweep we call the nested plane sweep in order to estimate both frontal and rear depths (assuming two layers for now). The resulting matching costs are used in a graph cut to estimate the initial layer depths. This is followed by iterative component color and depth estimation (see paper for details).

- Nested plane sweep for reflection sequence above, shown as MPEG of error distribution (higher intensity = larger error)


Nested Plane Sweep (MPEG)


(Random Dot Sequence)

- Graph cut result (image)
 

Frontal Depth

Rear Depth

 

- Iterative component color estimation algorithm 

Observed view

Forward warped frontal color

Difference image

Min-composite initialization (quite noisy)

Output of color update algorithm (with regularization)

Color update without regularization (streaking quite obvious)

 

Results 

Random dot sequence


Original sequence (MPEG)

Reference frame

Frontal depth

Rear depth

Frontal color

Rear color

Marble wall sequence


Original sequence (MPEG)

Reference frame

Frontal depth

Rear depth

Frontal color

Rear color

Mona Lisa sequence


Original sequence (MPEG)

Reference frame

Frontal depth

Rear depth

Frontal color

Rear color

Translucency sequence


Original sequence (MPEG)

Reference frame

Frontal depth

Rear depth

Frontal color

Rear color

Reflection sequence


Original sequence (MPEG)

Reference frame

Frontal depth

Rear depth

Frontal color

Rear color

 

Application: View synthesis

Once we have separated the layers with their respective depths (currently at a chosen reference view), we can then synthesize the original images using these layers. An example is given below. Note that the synthesized sequence is generated using only the layers extracted at one reference view (hence the observed cracks due to depth discontinuities at different viewpoints).
 


Original (MPEG)


Synthesized (MPEG)

Original (MPEG)

Synthesized (MPEG)