Object Detection for Augmented Reality “Tagging”

motor.png

I developed this technology while working at DAQRI Corp. in Vienna, Austria. DAQRI wanted object detection capability integrated into their augmented reality “smart glasses”. Specifically, they wanted to be able to “tag” a real-world object with virtual annotations, such as labels and instructions.

This is very useful for many applications! For example, a factory supervisor wearing an AR headset could tag a leaking seal on a machine with a work order for a technician. Later, another technician also wearing an AR headset could then easily locate the leaking seal and understand the tasks to be performed by viewing the annotations that are displayed, registered to the object.

Most object detection systems require large amounts of training images, in order to train the system to recognize an object. However, often it is impractical and expensive to capture a lot of training data, for a new object. Instead, I developed a method to train the system using a very small number of images, taken while the user tags the object. This is usually a very short (~1 minute) process, as shown in the video below. As a result, the method can be quickly and easily applied to new objects.

features.png

The method uses a convolutional neural network (CNN) to extract dense features from images. The CNN uses pre-trained weights, so that it doesn’t have to be retrained for each new object. Details are in my patent application (see https://patents.google.com/patent/US20200074672A1/en).

clusters.png

The system gradually learns a 3D model of the object, even though there is no explicit labeling of which points belong to the object and which points belong to the background. As more and more people use the system, the object appears in different scenes with different backgrounds. Thus, the system learns to ignore points from the background and which points are part of the object. As a result, the recognition model improves over time.

Most of the object detection code is in Python, and runs in its own process. The CNN was developed using TensorFlow. The object detection process communicates with the DAQRI Smart Glasses over a TCP socket. The object detection process can run on the smart glasses, or on an external server.