Augmented Reality Task Guidance
Comparison of Hand-held and Head-mounted Displays
Intuitively, a head-mounted display should be better than a hand-held display, when performing a hands-on task. Namely, with a head-mounted display, the user’s hands are free to work. The question is, how much better, and is the cost of a head-mounted display worth it?
While at the University of South Australia, I developed a testbed to compare a hand-held vs a head-mounted display, for AR-assisted hands-on tasks. The task was to connect a set of wires between pairs of screw posts. The app was developed using Unity and Vuforia, and was implemented on both an Android phone and the Microsoft HoloLens.
Overlays show which posts to connect with a wire.
A formal user study is not completed, but a preliminary informal study showed that the task time using with the head-mounted display was only about 10% faster than when using the hand-held display. This improvement may justify the cost if the task must be done frequently, but if the task is done infrequently it may not be worth it.
The testbed is being used for research in other areas of user interaction including intelligent tutoring systems. See B. Herbert, W. Hoff, and M. Billinghurst, "Usability Considerations of Hand Held Augmented Reality Wiring Tutors (Poster)." International Symposium on Mixed and Augmented Reality (ISMAR), 2020.
Learning Object and State Models for AR Task Guidance
When doing a task such as maintenance and assembly, the appearance of objects change. For example, a printer may have its front cover closed or open. Sometimes the visual differences between states can be subtle, such as the presence or absence of a screw, or the position of a switch.
We need computer vision to recognize not only the object, but also the state, in order to give guidance to the user about what to do next. Learning to recognize objects with multiple states must be fast and easy, because users don’t have the time to train recognition systems by taking a lot of images.
I developed an approach for automatically learning object states, from examples of experts performing the task. The approach makes use of the fact that the key features of the object are consistently present in multiple viewing instances; whereas features from the background or irrelevant objects are not consistently present. Using information theory, we automatically identify the features that can best discriminate between object states.
This work was supported by DAQRI by a grant to my university. For more details, see W. Hoff and H. Zhang, "Learning Object and State Models for AR Task Guidance," Proc of International Symposium on Mixed and Augmented Reality (ISMAR), pp. 272-273, September 2016, Merida, Mexico. (pdf)
Workflow Modeling for AR Task Guidance
The previous work was about automatically learning the appearance of object states, and how they vary during a maintenance or assembly task. We also need a way to quickly create a model of the workflow, or sequence of steps, in a task. With this model, we can detect if the user has made a mistake or skipped a step in the workflow. We can also give appropriate guidance when the user asks “what do I do next?”.
A colleague and I at the Colorado School of Mines developed a method to construct a workflow model automatically using demonstrations by domain experts. The task is represented as a Partially Observable Markov Decision Processes (POMDP) model. The parameters of the model are learned from example sequences of steps, performed by experts. As a result, workflow models can be quickly developed for new tasks.
For more details, see F. Han, J. Liu, W. Hoff and H. Zhang, "Planning-based Workflow Modeling for AR-enabled Automated Task Guidance," Proc of International Symposium on Mixed and Augmented Reality (ISMAR), pp. 58-62, October 2017, Nantes, France. (pdf)
The task of unjamming a copier is modeled as a state machine with actions, observations, and rewards.