Advanced Computer Vision

I developed and taught this course, which is a follow-on to the introductory course, at the graduate level. It covers the topics of structure and motion estimation, segmentation, object detection and recognition, and tracking, using classical methods as well as deep learning methods. I use Python, OpenCV, and PyTorch for demonstrations and assignments.

I emphasize hands-on work, and have the students work in pairs during class to complete lab assignments, usually one per week. Students also do programming assignments, and an independent final project of their own choosing.

Topics

  • Review of image formation, transformations, edge and line detection

  • Mathematical methods: linear and non-linear least squares, singular value decomposition

  • Direct linear transform

  • Estimating uncertainties in derived quantities

  • Essential and fundamental matrix

  • Structure from motion

  • Bundle adjustment

  • Stereo vision

  • Classification using decision trees, boosting, SVM

  • Classification using convolutional neural nets architecture, training

  • CNNs for object detection

  • Transfer learning

 

Example Assignments

Essential Matrix

This assignment was to detect and match features between two images and compute the essential matrix. Using the essential matrix, compute the relative pose between the cameras. The students also computed the true distance between the cameras, given that the window in the picture was of a certain known width.

First image, with epipolar lines.

First image, with epipolar lines.

Second image, with epipolar lines.

Second image, with epipolar lines.

 

Structure from Motion

This assignment was to reconstruct camera poses from a sequence of six images, and determine the true size of the box, given the known size of the $20 bill in the picture.

One of the images, with the “ground control points” marked.

One of the images, with the “ground control points” marked.

Reconstructed camera poses and 3D point positions.

Reconstructed camera poses and 3D point positions.

 

Object Detection

This assignment was to train a boosting classifier to recognize room signs in our campus building, using HoG features. The images below show successful detections on two test images.

sign1.png
sign3.png
 

Example Final Projects

This project by Alexander Dodge and Daniela Machnik had two parts: (1) Given an image of an ancient artifact such a lamp or bowl, find similar objects in a database of artifacts. (2) Place the artifact into a scene using correct perspective projection. For the first part, they used a CNN to generate a set of deep descriptors that were matched to a database. For the second part, the method finds vanishing points and computes the vertical planes in the scene. They can then warp a planar object to simulate its placement on a wall. See their slides for their class presentation.

Given the query image in the top left, the method finds the closest matching images in the database.

Given the query image in the top left, the method finds the closest matching images in the database.

Detected lines are used to find vanishing points.

Detected lines are used to find vanishing points.

Artifacts are placed in the image.

Artifacts are placed in the image.