Landmark Recognition — William Hoff

Landmark Recognition for Visual SLAM

The goal of this project was to develop a Visual SLAM (Simultaneous Localization and Mapping) system for a vehicle moving in an unstructured, outdoor environment. Specifically, a vehicle equipped with cameras and other sensors would detect and map landmarks in the scene. The map is used for subsequent relocalization, when another vehicle moves through the same environment later.

I performed this work while working as a consultant, during a sabbatical from my university. At that time, a lot of software tools that we take for granted today did not exist, so I developed the ENTIRE end-to-end system, from scratch! This included:

Camera calibration for multiple cameras on the vehicle, both intrinsic and extrinsic.
Feature keypoint detection.
Feature matching using a novel “eigenimage” approach (described below).
Sensor fusion, combining visual data with odometry and GPS data.
Bundle adjustment, using Levenberg-Marquardt optimization.
Explicit calculation of the uncertainty in estimated results.

I performed simulations to analyze the accuracy of pose estimation using various camera configurations. I found that a four camera configuration provided good pose accuracy, with the cameras spaced with a 90 degree angle separation around the periphery.

I mounted four cameras on a truck and captured data while driving through a variety of urban and rural environments. Each environment required two runs: One run to detect and map landmark points, and a second run to recognize the landmark points and determine the pose of the vehicle.

In the video, RED points are points that are being tracked in 2D only. YELLOW points are tracked points where we have also estimated their 3D locations. Note that sometimes the tracker tries to track points in the sky, or on the vehicle. Although such points can be tracked in 2D, they do not yield a consistent 3D solution, and therefore are eventually dropped.

Feature matching using an “eigenimage” approach

In outdoor environments, matching a feature using a standard descriptor such as SIFT doesn’t work well. Why? There are nonplanar surfaces with many occlusions, so the appearance of a small image patch changes with the viewpoint. I developed a novel method to represent the possible variation of appearance of an image patch, using “eigenimages”.

The method works by tracking an image patch over a range of viewpoints, and recording the images. In this video, image patch #63 is a feature centered on the trunk of a small tree. The patch is tracked for 68 viewpoints, and the image patches are shown below. Note how the appearance changes with the viewpoint. The main reason it is changing is that the trunk of the tree is an occluding edge – as the viewpoint changes, the background visible to the right of the trunk changes.

By doing a principal component analysis (PCA) decomposition of these image patches, we can find a small number of “eigenimages” that compactly represent the series. Any image in the sequence can be reconstructed using a linear combination of only five eigenimages.

Now we can use this representation to do better matching. Given a new image patch, we can project the patch onto the space of of eigenimages and evaluate whether it belongs to this subspace.

Results

Once the scene was mapped by the first vehicle, the second vehicle moved through the environment and used the mapped features to compute its pose. I compared the pose error of the second vehicle using the landmark recognition method, to the pose error using GPS and the vehicle odometry system. For one of the runs (total length 4.2 km), the RMS error for the landmark-based method was 0.37 m, and the RMS error for the odometry-only method was 5.1 m. The “lateral” positioning error (which is the most critical dimension for a vehicle trying to follow in the path of another vehicle) is shown below, as a function of image frame number. As can be seen, with odometry only, the lateral error grows over time.

*Lateral positioning error vs time, for the landmark recognition method (left) and odometry-only method (right).*