The following is an actual student's pset that received full marks, and demonstrates everything we were looking for. Special thanks to Ji Lin for letting us use their pset!
Posted 9/19/2019. Due Thursday, 9/26/2019 @ 11:59 PM
To submit, create a zip file containing this .ipynb file, and any resources your code needs to execute (like a data
directory). \
Name this zip yourkerberos.zip
, where yourkerberos
is replaced with YOUR kerberos, and upload it to Stellar/LMOD.
For our late policy, see http://6.869.csail.mit.edu/fa19/policy.html
You may discuss problems with other students, but all code / writing must be your own. Please list these collaborators below.
PSET 2 Collaboration Policy: You may ALSO choose to work with 1 other person. Your writeups must be submitted individually, however, you may use the same camera and same images from that camera.
This Jupyter notebook contains the instructions for completing the problem set.
We ALSO expect you to write your results, and code, in this notebook for your submission!
We will do our best to preface blocks of code, or text, which we think you'll need to change with the Your Work Here
block. However, it is up to you to make sure to read the PSet carefully, and answer all questions!
You are welcome to make edits anywhere in the pset, change our helper functions, restructure your code etc. However, please keep problem statements intact.
Jupyter notebook has two types of code blocks - either "code" blocks, which contain executable python code, or "Markdown" blocks, which contain rich text. Markdown blocks also support in-line LaTeX for equations.
See the markdown documentation, describing the syntax of the language here: https://daringfireball.net/projects/markdown/
my_name = "Ji Lin"
These code blocks simply initialize the notebook, performing imports, setting up file paths, making some formatting changes, and making a convenience function imshow
.
import os
import numpy as np
import matplotlib.pyplot as plt
import cv2
import requests
import sys
import json
%config InlineBackend.print_figure_kwargs = {'bbox_inches':None}
%config InlineBackend.rc = {'font.size': 10, 'figure.figsize': (10.0, 6.0), 'figure.facecolor': (1, 1, 1, 0), 'figure.subplot.bottom': 0.125, 'figure.edgecolor': (1, 1, 1, 0), 'figure.dpi': 72}
%matplotlib inline
rootpath = os.path.abspath(".")
print("Root path is set to: %s" % rootpath)
# NO COLAB SUPPORT FOR THIS PSET
def imshow(image, show_axes = False, quiet = False):
if len(image.shape) == 3:
# Height, width, channels
# Assume BGR, do a conversion since
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
else:
# Height, width - must be grayscale
# convert to RGB, since matplotlib will plot in a weird colormap (instead of black = 0, white = 1)
image = cv2.cvtColor(image, cv2.COLOR_GRAY2RGB)
# Draw the image
plt.imshow(image)
if not show_axes:
# We'll also disable drawing the axes and tick marks in the plot, since it's actually an image
plt.axis('off')
if not quiet:
# Make sure it outputs
plt.show()
The first part of this PSET is to build a camera obscura, or pinhole camera.
A camera obscura is a dark chamber with a single, tiny hole (the pinhole) in one side that lets light in. Rays of light from the outside enter the pinhole, and hit the wall opposite the pinhole, forming a projected image.
We will use a digital camera to capture these projected images in our camera obscura. Since a camera obscura lets in so little light (to keep the image sharp), we'll need to use a camera with a long exposure time, and take pictures IN THE DAYTIME!. A phone camera will work fine - many phones include a "manual mode" where you can set the exposure time yourself. You may need to download a free app, if you're finding your exposure time is still too small.
Please start this PSET early!! Since you need a lot of light to take pictures, you won't be able to complete this PSet on Thursday night!
See below for an example of what our camera looks like.
Below, you can see some images of our construction (default is instructor example - replace with yours!):
|
:---------------------:|:----------------------:
Take some pictures using your new camera! Use a digital camera to take photos of the white screen in the box (which should have something projected onto it).
You'll need a camera which lets you set the exposure time. Smartphones generally have a manual mode that will let you do this - you may need an app to give you further control though. You'll need to set your exposure time quite high.
You'll need to hold your camera still for the entire exposure time (otherwise your images will be blurry).
Think about the geometry of the scene, and why the images appear distorted in the camera obscura.
The higher the exposure time, the more light your digital camera will sense (and the longer you'll have to hold it still). If your pictures are too dark, try increasing the exposure time. If they're too bright, try decreasing it.
If the images are blurry, you can increase the sharpness by reducing the size of the pinhole. However, this means you'll have less light, so you may need to increase the exposure time.
You'll need to take pictures in bright daylight, so there's enough light! If it's not working though, try shining a flashlight through your pinhole - if you can't see it, there might be something wrong.
If your box is letting in more light than it should, the images will be washed in white If your box lets in light from anywhere except the pinhole, the camera will no longer work correctly.
Add some images taken from your camera obscura here! We've prepopulated with some images from our camera - replace with yours (and add more, using the same syntax, if you'd like)
|
:---------------------:|:----------------------:
|
One interesting aspect about pinhole cameras is that you can build different types of cameras by playing with the shape of the pinhole. So, let's modify our pinhole camera to be an anaglyph camera - a camera produces anaglyph (stereo) images with a single shot! We'll do this by using 2 pinholes, instead of just one.
There are several ways to capture and view stereo images. The most common mechanism is that two pictures are taken independently (with two cameras), and then later they are combined in software. Then, special hardware is used so that each of a viewer's eyes sees one image of the stereo pair (just like how you do in the physical world). How this image is transmitted and split depends on the technique - movie theaters often use polarized 3d glasses - the screen transmits the two images with only vertical and only horizontal waves respectively, and the glasses only let one orientation of light pass through to each eye.
Anaglyph images are a particular method to produce 3d images, that don't require a special screen to display. The image is formed by superimposing a pair of stereo images, with each image encoded in a separate color channel (generally red for the left image, and blue for the right image). Then, when viewed through colored glasses like the ones below, each eye will only see one image.
|
:---------------------:|:----------------------:
We will turn our camera obscura into an anaglyph camera by using 2 pinholes, and adding a color filter in front of each pinhole (which will tint the light that goes through it). This will produce an image that is the superposition of two images, with slightly different viewpoints, and with each image tinted a different color. These colors will correspond to those in the glasses' filter, meaning we can simply use another pair of identical 3d glasses to view the images.
We used Red/Blue 3d glasses. Use one pair of glasses to get the 2 color filters, which we place in front of the two pinholes (one each).
We will wear another pair when viewing images.
For this to work, you might need to play around with the distance between the two pinholes. We found 1.5 inches worked well for us. Try to take good quality images. With some light being filtered, each image will be rather dark, so you may need to increase your camera's exposure time.
If the images are too blurry, it'll be hard to percieve 3d!
As you increase the distance between the 2 pinholes, you get more stereo information. However, the two images will be farther apart, and may thus be harder to percieve as 3d. If the images are too far apart, your visual system won't be able to fuse the images, and you won't see the image as 3d anymore.
Lastly, the size at which you're seeing the image matters. Try viewing your image at different sizes.
To build an anaglyph camrea, we open another pinhole on the left side of the first one. The distance between the two pinholes is 1.5 inches. We then add color filters on top of the pinhole (blue filter on the left, red filter on the right).
Since the color filter reduces the light through the pinhole, we need to enlarge the exposure time. With our Canon 6D camera, we set the exposure time to 8 seconds, with ISO 1600 and aperture f4. To get a more obvious 3D effect, we need a closer distance to the object. Specifically, the distance between the camera and the object is about 1.5-2 meters (for a vehicle).
Take a few anaglyph pictures! Since the image is still distorted, you will not be able to view them with 3D perception yet. We'll solve this soon with calibration.
Add some raw images taken from your anaglyph here! They won't feel 3D just yet, so you don't need to show many. However, you'll want several pictures for later, when you're trying to calibrate.
![Raw Anaglyph] | ![Anaglyph Picture 2] :---------------------:|:----------------------:The images we've captured thus far contain distortion. This is because our camera-hole has a different viewpoint to the image than the pinhole does. More precisely, the camera is angled with respect to the "principle axis" of the camera (straight through the pinhole, normal to the cardboard)
If we knew the exact angle, and the distance from the camera to the image plane, we could compute an exact homography to transform the view from the camera into what the view from the pinhole (the pinhole-view should resemble reality).
However, since we can't measure this accurately in our setup, let's write a calibration script to estimate this homography.
To get a picture of what your pinhole sees, shine a bright light through your camera-hole, and take a picture through the pinhole. If this doesn't work (like if you're using a professional camera instead of a phone), then take a picture of an identical sheet of paper, at the same height and focal length of as your camera obscura's pinhole (focal length = distance from pinhole to sheet). You'll only have to do this once.
Since I used a professional camera, I cannot directly take a picture through a pinhole. Instead, I took an identical sheet of paper and took a picture at the same height and focal length.
Whenever you set up / move your camera, take a calibration picture - this picture should have a strong light source that makes the 4 corners of the white sheet apparent. \ Note: If the 4 corners of your white paper are already clearly visible in your picture, you may skip this step!
Now that we have 2 images, one from the viewpoint of the pinhole, and one from the viewpoint of the digital camera, we can calculate a homography from matching keypoints.
The first step is to find the positions of matching keypoints. We'll do this manually for now. (It's easier if you get these positions using image editing software like MS Paint, GIMP, Paint.NET, or Photoshop)
NOTE: The provided get_correspondences.py
script can make help you input correspondences between 2 images!
It can only run in a standard python interpreter (not notebook), though - Run it to get the lists below for pinhole_keypoints
and camera_keypoints
, or any other correspondences you'll need in the pset!
# TODO: Replace with your images
pinhole_viewpoint_fp = os.path.join(rootpath,"data", "my_images", "raw_images", "flashlight.jpg")
camera_viewpoint_fp = os.path.join(rootpath, "data", "my_images", "raw_images", "img1.jpg")
# Read both images
pinhole_viewpoint = cv2.imread(pinhole_viewpoint_fp)
camera_viewpoint = cv2.imread(camera_viewpoint_fp)
pv_h, pv_w, pv_c = pinhole_viewpoint.shape
cv_h, cv_w, cv_c = camera_viewpoint.shape
# Change both of these to match the white paper in your image image
pinhole_keypoints = np.array(
[
# Top-left
(165,77),
# Top-right
(803, 80),
# Bottom-left
(161, 580),
# Bottom-right
(804, 578)
]
)
camera_keypoints = np.array(
[
# Top-left
(43,85),
# Top-right
(726, 50),
# Bottom-left
(4, 610),
# Bottom-right
(734, 661)
]
)
imshow(pinhole_viewpoint, show_axes=True, quiet=True)
# Mark keypoints
plt.scatter(pinhole_keypoints[:,0], pinhole_keypoints[:,1], marker="x", s=100, c=[(1.0,0,0)])
plt.show()
imshow(camera_viewpoint, show_axes=True, quiet=True)
# Mark keypoints
plt.scatter(camera_keypoints[:,0], camera_keypoints[:,1], marker="x", s=100, c=[(1.0,0,0)])
plt.show()
# Let's crop the camera viewpoint to just the image
# Compute the corners of the region of interest
minX, minY = np.min(camera_keypoints, axis=0)
maxX, maxY = np.max(camera_keypoints, axis=0)
camera_keypoints[:,0] -= minX
camera_keypoints[:,1] -= minY
# Crop
# camera_viewpoint = camera_viewpoint[minX:maxX+1, minY:maxY+1]
camera_viewpoint = camera_viewpoint[minY:maxY+1, minX:maxX+1]
imshow(camera_viewpoint, show_axes=True, quiet=True)
# Mark keypoints
plt.scatter(camera_keypoints[:,0], camera_keypoints[:,1], marker="x", s=100, c=[(1.0,0,0)])
plt.show()
Now that we have a set of 4 matching keypoints, we can compute a homography! You also probably want to add a horizontal mirror (flip across the x-axis)
I used the below equation to calculate the homography matrix $H$. The $[h1, h2, ..., h9]$ is the matrix $H$ we try to find (after reshape). After building the leftmost matrix $A$, I compute SVD of $A$, and take the eigenvector with the smallest eigenvalue. The normalized eigenvector would be the matrix $H$.
def flip_camera_keypoints(camera_keypoints):
# Top left, Top right, Bottom Left, Bottom Right
# Flipped version is: (-> means "becomes")
# BL -> TL, BR -> TR, TL -> BL, TR -> BR
flipped_keypoints = camera_keypoints[np.array([2,3,0,1], dtype=np.int)]
return flipped_keypoints
def get_homography_matrix(camera_keypoints, pinhole_keypoints):
# FILL THIS IN
# You may find np.linalg.lstsq (least squares solver) useful
A = np.zeros([8, 9])
for i in range(4):
x1, y1 = camera_keypoints[i]
x1p, y1p = pinhole_keypoints[i]
A[2*i] = np.array([-x1, -y1, -1, 0, 0, 0, x1 * x1p, y1 * x1p, x1p])
A[2*i+1] = np.array([0, 0, 0, -x1, -y1, -1, x1 * y1p, y1 * y1p, y1p])
_, _, vh = np.linalg.svd(A)
H = vh[-1] / vh[-1, -1] # normalize
H = H.reshape(3, 3)
return H
def get_homography_matrix_cv(camera_keypoints, pinhole_keypoints):
# This is a working version of the above, you can use it for debugging.
# You MAY NOT use cv2.findHomography in your above implementation!
my_calibration_homography, _ = cv2.findHomography(camera_keypoints, pinhole_keypoints)
return my_calibration_homography
def apply_homography(camera_image, pinhole_image, homography_matrix):
pv_h, pv_w = pinhole_image.shape[:2]
# Apply the homography
dest_size = (pv_w, pv_h)
outim = cv2.warpPerspective(camera_image, homography_matrix, dest_size)
return outim
def calibrate_image(camera_image, pinhole_image, camera_image_keypoints, pinhole_image_keypoints):
# H is the homography matrix
# TODO: Replace this call with one to get_homography_matrix
H = get_homography_matrix(flip_camera_keypoints(camera_image_keypoints), pinhole_image_keypoints)
# print(get_homography_matrix(flip_camera_keypoints(camera_image_keypoints), pinhole_image_keypoints))
# print(get_homography_matrix_cv(flip_camera_keypoints(camera_image_keypoints), pinhole_image_keypoints))
calibrated = apply_homography(camera_image, pinhole_image, H)
return calibrated
'''
Note, This error is okay:
FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
'''
imshow(calibrate_image(camera_viewpoint, pinhole_viewpoint, camera_keypoints, pinhole_keypoints))
known_calibrations = {
"flashlight.jpg": np.array([
# Top-left
(165,77),
# Top-right
(803, 80),
# Bottom-left
(161, 580),
# Bottom-right
(804, 578)
]),
"img1.jpg": np.array([
# Top-left
(43,85),
# Top-right
(726, 50),
# Bottom-left
(4, 610),
# Bottom-right
(734, 661)
]),
"img2.jpg": np.array([
# Top-left
(43,85),
# Top-right
(726, 50),
# Bottom-left
(4, 610),
# Bottom-right
(734, 661)
]),
"img3.jpg": np.array([
# Top-left
(43,85),
# Top-right
(726, 50),
# Bottom-left
(4, 610),
# Bottom-right
(734, 661)
]),
"img4.jpg": np.array([
# Top-left
(43,85),
# Top-right
(726, 50),
# Bottom-left
(4, 610),
# Bottom-right
(734, 661)
]),
"threed1.jpg": np.array([
# Top-left
(43,85),
# Top-right
(726, 50),
# Bottom-left
(4, 610),
# Bottom-right
(734, 661)
]),
"threed2.jpg": np.array([
# Top-left
(43,85),
# Top-right
(726, 50),
# Bottom-left
(4, 610),
# Bottom-right
(734, 661)
]),
"threed3.jpg": np.array([
# Top-left
(43,85),
# Top-right
(726, 50),
# Bottom-left
(4, 610),
# Bottom-right
(734, 661)
]),
}
Place all your raw images (that you would like us to see) in a folder, specified below - we'll recalibrate them using the above data when grading. Make sure to set calibration_pinhole_viewpoint_fname
to the name of your calibration "from-pinhole" view. We'll calculate a homography to transform the raw picture to the "from pinhole" view.
Also, make sure to add calibration data above for all the images in the raw_images
folder!
You can use the cells above to test calibration out interactively.
# Will be joined to the rootpath
# Change these paths to find your data.
raw_image_folder_pathtuple = ("data","my_images","raw_images")
output_calibrated_image_folder_pathtuple = ("output","calibrated_images")
calibration_pinhole_viewpoint_fname = "flashlight.jpg"
# Will only search for images that end with these extensions
image_extensions = set(['.png', '.jpeg', '.jpg'])
The script below scans your directory, and calibrates all the images in that directory.
# Get these as filepaths
raw_image_folder_fp = os.path.join(rootpath, *raw_image_folder_pathtuple)
calibrated_image_folder_fp = os.path.join(rootpath, *output_calibrated_image_folder_pathtuple)
# Make the calibrated dir if necessary
os.makedirs(calibrated_image_folder_fp, exist_ok=True)
# Scan the raw images folder, including only images with known extensions
raw_image_fnames = [fname for fname in os.listdir(raw_image_folder_fp) if os.path.splitext(fname)[1] in image_extensions]
print("Found images:",raw_image_fnames)
NO_CALIBRATION_IM_ERR_MESSAGE = "ERROR: Could not find calibration from-pinhole viewpoint image: %s" % (calibration_pinhole_viewpoint_fname,)
assert calibration_pinhole_viewpoint_fname in raw_image_fnames, NO_CALIBRATION_IM_ERR_MESSAGE
def calibrate_from_fname(camera_fname, pinhole_fname):
camera_image_fp = os.path.join(raw_image_folder_fp, camera_fname)
pinhole_image_fp = os.path.join(raw_image_folder_fp, pinhole_fname)
camera_image = cv2.imread(camera_image_fp)
pinhole_image = cv2.imread(pinhole_image_fp)
camera_kp = known_calibrations[camera_fname]
pinhole_kp = known_calibrations[pinhole_fname]
return calibrate_image(camera_image, pinhole_image, camera_kp, pinhole_kp)
desired_image_fnames = set(raw_image_fnames) - set([calibration_pinhole_viewpoint_fname])
unknown_images = desired_image_fnames - set(known_calibrations.keys())
if len(unknown_images) > 0:
print("Skipping images without known_calibration entries:",sorted(unknown_images))
desired_image_fnames = sorted(desired_image_fnames - unknown_images)
for each_fn in desired_image_fnames:
calib_img = calibrate_from_fname(each_fn, calibration_pinhole_viewpoint_fname)
cv2.imwrite(os.path.join(calibrated_image_folder_fp, each_fn), calib_img)
imshow(calib_img)
Describe your methodology for camera calibration
To perform camera calibration, we need to map the coordinates from the camera view point $(x_1, y_1)$ to the pinhole view point $(x_1', y_1')$ with a homography transform. Suppose the homography matrix $H$ is
$$H=\begin{bmatrix}h1 & h2 & h3\\h4 & h5 & h6 \\ h7 & h8 & h9\end{bmatrix}$$We can compuate $H$ with the 4 coordinate pairs using:
where $(x_i, y_i)$ is the coordinate in of the camera view, $(x_i', y_i')$ is the coordinate in the pinhole view. After building the leftmost matrix $A$, we can compute SVD of $A$, and take the eigenvector with the smallest eigenvalue. After that, we normalized the vector by dividing the vector with its last element (such that $h9=1$). The normalized eigenvector would be the matrix $H$ (after reshape).
We all know how to measure distances to an object using a ruler. However, using a ruler is an intrusive procedure, that involves placing one extreme of the ruler at the edge of the object. Some objects do not like to be constantly approached by rulers. Stereo Vision provides a way to measure distances between the camera and another object remotely (without touching the object).
Below, you can see an anagylph picture of a flashlight, taken from two different distances. A single point of light produces 2 seperate spots in the image, one from each pinhole, each with a different color, from the color filter on the respective pinhole. The distance between the two spots, $d$, changes with respect to the distance, $Z$, between the camera and the light.
|
:---------------------:|:----------------------:
Therefore, we can measure the distance $Z$ to the light, just by measuring $d$, assuming we have a perfect model of the camera geometry (focal length, distance between pinholes, pixel-to-world for paper)!
Place the light at various known distances from the pinholes, and take a set of pictures.
Next, plot the relationship between $Z$, the distance to the light, and $d$, the real-world distance between the two images of the light.
To make this plot, we'll first need to find $d$ from an image. For now, we'll do everything in pixel space. You'll need to find a way to convert this value to real-space.
To compute $d$, first extract the red and blue channels from the image. Recall that images are a shape [H,W,C] array, where the color channels are ordered BGR.
Then, use np.argmax
, and np.unravel_index
to get the (r,c) index of the brightest (maximum) pixel. Reverse this to (c, r), which maps to our usual (x,y) indexing into an image.
Theoretically, there should only be a difference horizontally (should align vertically). Since our image is not perfect, and our holes may not be exactly aligned, we'll ignore any vertical difference.
# Let's analyze this light image. Replace with yours.
light1 = cv2.imread(os.path.join(rootpath, "data","instruction_images","AnaglyphLight1.png"))
# Extract the blue channel.
blue_channel = light1[:, :, 0]
# Get the brightest-blue (x, y)
brightest_blue_flat = np.argmax(blue_channel)
brightest_blue = list(np.unravel_index(brightest_blue_flat, blue_channel.shape[:2])[::-1])
# Extract the red channel
red_channel = light1[:, :, 2]
# Get the brightest red (x,y)
brightest_red_flat = np.argmax(red_channel)
brightest_red = list(np.unravel_index(brightest_red_flat, blue_channel.shape[:2])[::-1])
# We compute the average y to plot it
# Uncomment this block once you've implemented above
average_y = int((brightest_blue[1] + brightest_red[1]) / 2)
brightest_blue[1] = average_y
brightest_red[1] = average_y
imshow(light1, quiet=True)
plt.scatter([brightest_blue[0], brightest_red[0]], [brightest_blue[1], brightest_red[1]], marker="x", s=200, c=[(0.0,1.0,0)])
plt.show()
pass
Use the above to make a function to compute d (always positive) given an image, then make a plot of Z vs d (Z on y-axis, d on x-axis)
We took some images with a phone flash light at different distances. To get the transform between pixel distance $d_p$ to world distance $d$, we need to first calibrate the image, so that we can calculate the world distance according to the size of the screen (a letter paper). Attached are our images after calibration:
![dist1] | ![dist2] | ![dist3] | ![dist4] :---------------------:|:----------------------:|:----------------------:|:----------------------: ![dist5] | ![dist6] | ![dist7] | ![dist8]def get_d_for_image(light_image):
# Extract the blue channel.
blue_channel = light_image[:, :, 0]
# Get the brightest-blue (x, y)
brightest_blue_flat = np.argmax(blue_channel)
brightest_blue = list(np.unravel_index(brightest_blue_flat, blue_channel.shape[:2])[::-1])
# Extract the red channel
red_channel = light_image[:, :, 2]
# Get the brightest red (x,y)
brightest_red_flat = np.argmax(red_channel)
brightest_red = list(np.unravel_index(brightest_red_flat, blue_channel.shape[:2])[::-1])
return np.abs(brightest_blue[0] - brightest_red[0])
# Replace with your path
light_directory = os.path.join(rootpath, "data", "my_images", "distance")
# light_directory = os.path.join(rootpath, "output", "calibrated_images")
def fn_to_im(fn):
# Changes a filename to a path, then reads / returns that image
return cv2.imread(os.path.join(light_directory, fn))
# Replace with your image names
images = [
fn_to_im('3.jpg'),
fn_to_im('5.jpg'),
fn_to_im('7.jpg'),
fn_to_im('10.jpg'),
fn_to_im('15.jpg'),
fn_to_im('20.jpg'),
fn_to_im('25.jpg'),
fn_to_im('30.jpg'),
]
# Replace with your Z
known_Z = [
3, 5, 7, 10, 15, 20, 25, 30
]
# Use the above function to compute d
computed_d = [
get_d_for_image(img) for img in images
]
# Plots D vs Z
plt.plot(computed_d, known_Z)
plt.xlabel("pixel-length d")
plt.ylabel("Distance from camera Z")
plt.title("Measuring distances")
plt.show()
Derive an analytical expression relating $d$ and $Z$, using the parameters of the camera (distance to back wall, seperation between the two pinholes, etc). Do your predictions fit the experimental results?
Describe your derivation, including plots and diagrams as necessary (markdown supports images, see above for examples). You can use $$LaTeX$$ for equations like $a^2+b^2=c^2$
The light path of an anaglyph pinhole camera can be described below:
In the figure, we have $\triangle OBD\sim\triangle OAC$ and $\triangle ODF\sim\triangle OCE$, therefore, we can derive the relationship:
$$\frac{d_0}{d} = \frac{OD}{OC} = \frac{OF}{OE}=\frac{Z}{Z+f}$$Therefore, we can calculate $Z$ using: $$Z=\frac{fd_0}{d-d_0}$$
where $d_0$ is the distance between two pinholes, and $d$ is the world distance between the two light spots (i.e., measured using a ruler). For our camera, $d_0=1.5 \text{ inches}$, $f=12.6 \text{ inches}$. Therefore, the relationship between $d$ and $Z$ is:
$$Z=\frac{18.9}{d-1.5}\text{ (inches)}$$We need to further convert the world distance $d$ into pixel distance $d_p$. In the pinhole view image, the longer side of the paper has 572 pixels, which is in 11 inches in reality. Therefore,
$$d = \frac{11d_p}{572}\text{ inches}$$Finally,
$$Z=\frac{18.9\times 572}{11d_p-1.5\times572}\text{ inches}$$We plot our analytical results compared to the experimental results below. The two curves are pretty close, which shows that our analysis is correct. There are still some minor gaps between two curves (~1 inch), which could be due to experiment error. For example, we may not hold the phone completely vertical, which will result in an inaccurate actual distance $Z$.
# Plots D vs Z
plt.plot(computed_d, known_Z)
paper_length = 572
analytical_x = np.linspace(computed_d[0], computed_d[-1], 1000)
analytical_result = [18.9*paper_length/(11*d-1.5*paper_length) for d in analytical_x]
plt.plot(analytical_x, analytical_result)
plt.legend(['experimental', 'analytical'])
plt.xlabel("pixel-length d")
plt.ylabel("Distance from camera Z")
plt.title("Experimental v.s. analytical")
plt.show()
In this part, the goal is to apply manually a stereo reconstruction algorithm. The goal is: given two images, to recover the full 3D structure of the image (we will only produce a coarse approximation to it). Automatic algorithms for measuring depths from stereo can be challenging. However, we will do it manually by following the next steps:
The following figure (top) shows one example showing the red and blue channels of the stereo picture. We used the top image as the reference image, and then computed depth at a set of selected points on the top image. Once you have a depth map (the value of the distance at each pixel in the image), you can do different things like trying to rotate the image in 3D. The bottom shows two different viewpoints of the car.
Add the picture you took, the point correspondences, and different viewpoints of the scene in the following cell.
Firstly, we show the anaglyph image we took, and its corresponding blue and red channels.
The corresponding points are marked in the figure.
#Add the picture you took, the point correspondences, and different viewpoints of the scene.
image3d = cv2.imread('output/calibrated_images/threed1.jpg')
blue_channel = image3d[:, :, 0]
red_channel = image3d[:, :, 2]
# save blud and red channel for
plt.figure(figsize=(20,5))
plt.subplot(131)
imshow(image3d, quiet=True)
plt.title('Anaglyph')
plt.subplot(132)
imshow(blue_channel, quiet=True)
plt.title('Blue')
plt.subplot(133)
imshow(red_channel, quiet=True)
plt.title('Red')
plt.show()
corr_left = [(444, 114), (581, 116), (567, 231), (429, 292), (308, 288), (324, 191), (385, 102), (573, 378),
(491, 432), (631, 129), (634, 205), (552, 321), (665, 210), (665, 235), (663, 127)] # , (691, 294)]
corr_right = [(542, 117), (684, 118), (668, 234), (528, 297), (401, 293), (417, 195), (480, 103), (672, 380),
(594, 434), (734, 133), (734, 205), (655, 324), (762, 211), (763, 239), (764, 132)] # , (791, 297)]
plt.figure(figsize=(20,5))
plt.subplot(121)
imshow(blue_channel, quiet=True)
plt.scatter([x for x, _ in corr_left], [y for _, y in corr_left], marker="x", s=30, c=[(0.0,1.0,0)])
for i, (x, y) in enumerate(corr_left):
plt.text(x-3, y+30, str(i), fontsize=8, color=[1, 1, 1])
plt.subplot(122)
imshow(red_channel, quiet=True)
plt.scatter([x for x, _ in corr_right], [y for _, y in corr_right], marker="x", s=30, c=[(0.0,1.0,0)])
for i, (x, y) in enumerate(corr_right):
plt.text(x-3, y+30, str(i), fontsize=8, color=[1, 1, 1])
plt.show()
Next, we can calculate the depth using pixel distance, and then interpolate over the space to reconstruct the depth map for the region.
pixel_dist = [abs(x-y) for (x, _), (y, _) in zip(corr_left, corr_right)]
depth = [18.9*paper_length/(11*d-1.5*paper_length) for d in pixel_dist]
left_x = [p[0] for p in corr_left]
left_y = [p[1] for p in corr_left]
grid_x, grid_y = np.mgrid[0:1000, 0:667]
import scipy.interpolate
interp_depth = scipy.interpolate.griddata(np.array(corr_left), depth, (grid_x, grid_y), method='linear').T
plt.imshow(interp_depth)
plt.show()
It is clear that the depth map show the front and side of the vechile: The more left, the more distant.
Next, we would like to rotate the reconstructed result to get a new view of the object. We plot the surface in a 3D coordinate and set a novel view point to get a new view. Below are three views of the reconstructed result.
(Note: to get a detailed view, the plot_surface function will run very slow).
from mpl_toolkits.mplot3d import Axes3D
ax = plt.figure().gca(projection='3d')
ax.view_init(180, -80)
# ax.set_xlim([0, blue_channel.shape[0]])
# ax.set_ylim([0, blue_channel.shape[1]])
bt = blue_channel.T.reshape(blue_channel.shape[1], blue_channel.shape[0], 1)
blue_channel3 = np.concatenate((bt, bt, bt), axis=2)
ax.plot_surface(grid_x, interp_depth.T, grid_y, facecolors=blue_channel3 / 255., shade=False,
antialiased=True, rstride=2, cstride=2)
plt.show()
ax = plt.figure().gca(projection='3d')
ax.view_init(150, -80)
ax.plot_surface(grid_x, interp_depth.T, grid_y, facecolors=blue_channel3 / 255., shade=False,
antialiased=True, rstride=2, cstride=2)
plt.show()
ax = plt.figure().gca(projection='3d')
ax.view_init(180, -65)
ax.plot_surface(grid_x, interp_depth.T, grid_y, facecolors=blue_channel3 / 255., shade=False,
antialiased=True, rstride=4, cstride=4)
plt.show()