16-822 Geometry-based Methods in Vision

Assignment 3: 3D Reconstruction

Zihan Wang (zihanwa3), Fall 2024

Q1: 8-point and 7-point algorithm

(A1) F matrix using 8-point algorithm

We have 2 sets of normalized corresponding points x^=Tx where T=[s0sx00ssy001]. Therefore, we can contruct a matrix form of Af^=0, where each row of A is Ai=[uiui,uivi,ui,viui,vivi,vi,ui,vi]. With this matrix A, we can apply SVD on it and solve the f^ using the last row of VT. We can also enforce the rank 2 constraint by decomposing F^ again and setting the last singular value to 0. Finally, with such F^, we can denormalize it by F=TTF^T.

For the bench example, the estimated F and visualization of epipolar lines are as follows:

F: [[-1.19265833e-07 -2.53255522e-06 2.07584109e-04] [-3.96465204e-07 2.27096267e-07 2.48867750e-02] [-1.06890748e-03 -2.30052117e-02 1.00000000e+00]]

q1_1 q1_2

For the remote example, the estimated F and visualization of epipolar lines are as follows:

F: [[ 7.19006698e-07 -1.90583125e-07 2.36215166e-03] [-1.88159055e-06 -8.60510729e-07 -8.45685523e-03] [-3.98058321e-03 9.46500248e-03 1.00000000e+00]]

q1_3 q1_4

 

(A2) E matrix using 8-point algorithm (5 points)

We can calculate E using the intrinsic matrices and F by E=KTFK.

For the bench example, the E is:

E: [[ -0.41524274 -8.81748894 -2.15055808] [ -1.3803559 0.79067134 69.08265036] [ -3.58890876 -68.47438701 1. ]]

For the remote example, the E is:

E: [[ 2.5298064 -0.67056178 7.33094837] [ -6.62032749 -3.02768466 -28.29861665] [-14.50071092 26.41824781 1. ]]

 

(B) 7-point algorithm

Similar to 8-point algorithm, we can still adopt the same way to construct of Af=0. The difference here is after SVD we should get two F by selecting the last two rows of VT. With F1 and F2, to find λ that satisfies det(λF1+(1λF2))=0, need to solve a cubic equation by substituting λ=1,0,1,2. After solving for the real value roots of such a equation, we can get all the possible F. We choose the F that has the minimize the error.

For the ball example, the estimated F and visualization of epipolar lines are as follows:

F: [[-3.35078059e-09 -2.91545005e-06 -6.74891733e-03] [ 3.51682683e-06 -8.84237458e-07 -1.50044163e-02] [ 9.09398538e-03 1.48748649e-02 1.00000000e+00]]

q1_5q1_6

For the hydrant example, the estimated F and visualization of epipolar lines are as follows:

F: [[ 1.06836663e-07 -3.30854241e-06 -1.37397509e-03] [ 4.79268901e-06 1.63305018e-07 -1.67456794e-02] [ 1.12974017e-03 1.64136613e-02 1.00000000e+00]]

q1_7q1_8

Q2: RANSAC with 7-point and 8-point algorithm

Steps in the Algorithm:

  1. Random Sampling:

    • Randomly sample 7 or 8 point correspondences to compute a candidate fundamental matrix.

  2. Reprojection and Inlier Counting:

    • Use fundamental matrix to reproject points from the second image onto the first to get the epipolar line.

    • Measure the distance of each point in the second image to the epipolar line and count inliers whose sum distances below threshold.

  3. Inlier Ratio Evaluation:

    • Compute the ratio of inliers to total correspondences. If the ratio is higher than any previous iteration, update best estimated fundamental matrix.

  4. Iterate and Select Best Model:

    • Repeat this process, retaining fundamental matrix that maximizes the inlier ratio.

For the bench example, the visualization (graph plot) of % of inliers vs. # of RANSAC iterations:

q2_1_5

RANSAC with 8-point results are as follows: F: [[-1.92395063e-07 -3.98637277e-06 3.58390838e-04] [ 9.76470194e-07 -2.19440493e-07 2.51422815e-02] [-1.15659998e-03 -2.30886805e-02 1.00000000e+00]]

q2_1_1q2_1_2

RANSAC with 7-point results are as follows:

F: [[-1.06745741e-07 -2.56784447e-06 7.88289386e-05] [-2.31905535e-07 2.51495527e-07 2.41805627e-02] [-9.37432897e-04 -2.24758871e-02 1.00000000e+00]]

q2_1_3q2_1_4

  1. For the remote example, the visualization (graph plot) of % of inliers vs. # of RANSAC iterations:

q2_2_5

RANSAC with 8-point results are as follows:

F: [[ 4.94228952e-07 -1.78309087e-07 2.07870586e-03] [-1.44717993e-06 -3.86418454e-07 -8.04693202e-03] [-3.48114568e-03 8.50582289e-03 1.00000000e+00]]

q2_2_1q2_2_2

RANSAC with 7-point results are as follows:

F: [[ 2.24009193e-06 -3.99495480e-06 3.98001941e-03] [-9.01801147e-07 -8.38163830e-07 -1.13465404e-02] [-5.93169224e-03 1.34755145e-02 1.00000000e+00]]

q2_2_3q2_2_4

  1. For the ball example, the visualization (graph plot) of % of inliers vs. # of RANSAC iterations:

q2_3_5

RANSAC with 8-point results are as follows:

F: [[-3.85293031e-08 -3.15784357e-06 -7.03554893e-03] [ 3.72620577e-06 -1.00245191e-06 -1.58804967e-02] [ 9.61780541e-03 1.58897996e-02 1.00000000e+00]]

q2_3_1q2_3_2

RANSAC with 7-point results are as follows:

F: [[-2.77460041e-08 -2.89765184e-06 -6.55468330e-03] [ 3.47679767e-06 -8.49770567e-07 -1.45957327e-02] [ 8.83170495e-03 1.44185603e-02 1.00000000e+00]]

q2_3_3q2_3_4

  1. For the hydrant example, the visualization (graph plot) of % of inliers vs. # of RANSAC iterations:

q2_4_5

RANSAC with 8-point results are as follows:

F: [[ 6.06381990e-08 -5.38367018e-06 -8.90646029e-04] [ 6.92565753e-06 -1.84686785e-07 -1.76920320e-02] [ 5.54842010e-04 1.76686648e-02 1.00000000e+00]]

q2_4_1q2_4_2

RANSAC with 7-point results are as follows:

F: [[ 7.60374410e-08 -3.93180816e-06 -1.21370129e-03] [ 5.38148705e-06 5.75852498e-08 -1.73749242e-02] [ 9.99048841e-04 1.71740235e-02 1.00000000e+00]]

q2_4_3q2_4_4

Q3: Triangulation!

With correponding points and projection matrices, we can construct the A by [[x]×P[x]×P]. With this matrix, we can calculate the 3D points coordinates by applying SVD, and select the last row of VT.

The output colored 3D point cloud is as follows:

q3_1

 

q3_1

 

q3_1

 

Q4: Reconstruct your own scene!

Loading animation

Loading animation

Q5: Bonus 1

I use SIFT feature extractor and cv2.BFMatcher to find the potential matches. After finding those corresponding points, I use the RANSAC with 8-point algorithm as illustrated in Q2.

The estimated F and visualization of epipolar lines are as follows:

F: [[ 5.74791039e-07 -6.61753809e-06 -5.51920392e-03] [ 1.05967229e-05 7.37569928e-07 -3.84149414e-03] [ 2.71904791e-03 2.67848847e-03 1.00000000e+00]]

q5_1q5_2

Q6: Bonus 2

What happens if we reduce number of input images? Here is the result I downsample 25% of the original images used for last question, as we can observe, as view reduces, as long as they still have good view coverage, a decent result can be observered. However, less image causes a quality drop and we can observe the reconstruction collapse to the bottom as scarity of top views, thus hampering the overall geomery.

Loading animation

Under what scenario and conditions does the reconstruction pipeline breaks? Here I further reduce the images to around 10, we can see that the reconstruction completely fails and due to insufficent observation there is camera very far away which is failed to be placed correctly.

Loading animation