Advanced Computer Vision
I developed and taught this course, which is a follow-on to the introductory course, at the graduate level. It covers the topics of structure and motion estimation, segmentation, object detection and recognition, and tracking, using classical methods as well as deep learning methods. I use Python, OpenCV, and PyTorch for demonstrations and assignments.
I emphasize hands-on work, and have the students work in pairs during class to complete lab assignments, usually one per week. Students also do programming assignments, and an independent final project of their own choosing.
Topics
Review of image formation, transformations, edge and line detection
Mathematical methods: linear and non-linear least squares, singular value decomposition
Direct linear transform
Estimating uncertainties in derived quantities
Essential and fundamental matrix
Structure from motion
Bundle adjustment
Stereo vision
Classification using decision trees, boosting, SVM
Classification using convolutional neural nets architecture, training
CNNs for object detection
Transfer learning
Example Assignments
Essential Matrix
This assignment was to detect and match features between two images and compute the essential matrix. Using the essential matrix, compute the relative pose between the cameras. The students also computed the true distance between the cameras, given that the window in the picture was of a certain known width.
First image, with epipolar lines.
Second image, with epipolar lines.
Structure from Motion
This assignment was to reconstruct camera poses from a sequence of six images, and determine the true size of the box, given the known size of the $20 bill in the picture.
One of the images, with the “ground control points” marked.
Reconstructed camera poses and 3D point positions.
Object Detection
This assignment was to train a boosting classifier to recognize room signs in our campus building, using HoG features. The images below show successful detections on two test images.
Example Final Projects
This project by Alexander Dodge and Daniela Machnik had two parts: (1) Given an image of an ancient artifact such a lamp or bowl, find similar objects in a database of artifacts. (2) Place the artifact into a scene using correct perspective projection. For the first part, they used a CNN to generate a set of deep descriptors that were matched to a database. For the second part, the method finds vanishing points and computes the vertical planes in the scene. They can then warp a planar object to simulate its placement on a wall. See their slides for their class presentation.
Given the query image in the top left, the method finds the closest matching images in the database.
Detected lines are used to find vanishing points.
Artifacts are placed in the image.