While many people have speculated that virtual environments are a more efficient way of interacting with three-dimensional computer generated scenes, few empirical studies have been performed. Most existing work has been point design, where a proof-of-concept application is constructed. These applications typically show the limitations of the current technology, and are not rigorously evaluated. Our approach is to systematically subdivide the design space for potential virtual environment applications, and attempt to discern for which types of tasks these new interaction devices are most helpful. This knowledge should be of general use to designers of virtual environment applications. In this way, our general approach is similar to early work measuring the effectiveness of the mouse [Card 87] as a generic pointing device.
As our first piece of work in this area, we have set out to measure the effectiveness of a head-mounted display in a generic searching task: locating twenty targets in a synthetic room. The central user operation in this task is controlling the orientation of a synthetic camera which controls the viewpoint within a computer-generated scene. The fundamental question is whether searching can be performed more quickly by using head motion to control the synthetic camera, or by using a traditional, fixed-location monitor and a hand-held input device to control the camera.
We attempted to design a simple task in order to avoid potential confounding factors. In this task, no input devices were used beyond those to control the camera orientation, and the only important metric was task completion time. Our first challenge was to select the hardware configuration that we would compare with the head-mounted display and attached tracker. We rejected the familiar configuration of desktop monitor and mouse on the grounds that it would introduce too many confounds (variables) to the study. In the interest of keeping as many things constant as possible, we decided to use the head-mounted display as the stationary display device, placing the orientation tracker in the user's hand, rather than attaching it to his or her head.
The study produced two major results. First, users controlling the camera via head tracking completed the search task almost twice as fast (task reduction time of 42%). Second, we observed a training phenomenon: practice with the head tracking significantly improved performance of users who subsequently used hand tracking to control the camera. We saw no carryover in the opposite direction: practice with hand tracking did not improve performance using head tracking.
User studies have been performed to determine the effectiveness of various real-time graphics techniques for depth cuing and the like [Liu 91, Liu 92]. The work we are aware of that is closest in spirit to this work is J. Chung's research on the task of targeting radiation treatment beams [Chung 92]. Another interesting study is the work by Tom Piantanida and his group at SRI, who are examining trade-offs in field of view and resolution in head-mounted displays [Piantanida 92].
In addition to the display, the input devices used to control the virtual camera introduced another unwanted variable. Six degree-of-freedom trackers, such as the Polhemus Isotrak [Polhemus], have significant lag times [Liang 91, Adelstein], low accuracy, and high noise when compared to desktop devices such as the mouse or Spaceballtm [Spaceball]. Therefore, we designed the study to mechanically eliminate all these variables by using identical hardware and software for both the head tracked and hand tracked conditions. Figure 1 shows the configuration of our head-mounted display subjects. They each wore a VPL Eyephonetm and used a Polhemus 3spacetm tracker. The graphics software, running on a pair of Silicon Graphics VGXtm machines, presented an environment where the subjects were in the middle of a room, 6 meters long, 6 meters wide, and 3.4 meters tall. Inside this room, there were 20 targets, each a two digit number roughly 0.3 meters tall. The targets were large enough to be easily viewed in the display.
The hand tracked group used the same physical display (the VPL Eyephone), but held a fixed location in space, much like the equipment one "sticks one's face up to" at an eye-doctor's office. In this way, we make the VPL Eyephone a stationary monitor, just as a desktop display would be relative to the user. Figure 1 shows a rigid ceiling mount for demonstrational purposes; in the study we used human helpers who held the display with their hands. The tracking device was removed from the subject's head, and manipulated by the subject with either one or both hands. Thus, the only difference between the two groups was whether the virtual camera was controlled by muscles in the head, or muscles in the hand. Each users's base position within the graphics environment was fixed. Although we used all six degrees of freedom from the physical tracker, the (x, y, z) translations were small (less than one foot); the basic task was controlling the orientation of the camera.
Each trial required the subject to identify twenty targets in the room; timing began as soon as the graphics were initially presented. When the subject located a two digit number, he or she would call out the number, the experimenter would type the number, and that target would disappear from the display. The frame rate was roughly ten frames per second. As an orientation cue, we placed a unique graphics object (filing cabinet, plant, bookcase, and chair) in each corner of the virtual room.
Balanced by gender, we randomly divided the subjects into the head tracked (virtual environment) group and the hand tracked group. Each subject performed ten trials, where each trial consisted of locating twenty targets. The subjects were informed for each trial when they had located all the targets. In advance of the experiment, we generated sets of random target locations within the room. We generated enough sets to avoid fears that a given random placement was somehow skewed, and then used those data sets for both groups of subjects. In the psychology community, this is referred to as yoking the input sets. To avoid complications arising from targets which were difficult to read, we oriented the targets so they were always upright and perpendicular to the subjects' line of sight; since the subject was effectively stationary, this orientation could be performed once at the time of data set generation. We also constrained targets to be at least 0.6 meters from the room's surfaces, and to not fall within a 1.0 meter diameter cylinder centered about the subject. Objects were also kept a minimum of 1.5 meters from each other, to avoid problems with visual overlap.
Subjects were allowed to practice performing the task until they were comfortable, and then ten measured trials began. Trials one through three were used as practice and are not included in our analysis. After completing their trials, subjects rested briefly and then switched modes and performed another ten trials: head tracked subjects tried hand tracking, and hand tracked subjects tried head tracking. This led us to discover a training effect which we discuss in the results section. At the end of the session, which typically took thirty to sixty minutes per subject, we asked the subject a series of questions about the experiment.
Our second major result is a training effect: subjects performed the task 23% faster using hand tracking if they had first used head tracking. Experience with hand tracking, however, did not improve subsequent performance with head tracking.
Although we eliminated the first three of ten trials for each subject in our averages, Figure 3 shows that the "practice effect" ceased approximately after the first trial, and that for the head tracked subjects, effectively no learning is necessary, bolstering the argument that head tracking is more natural for controlling the camera in this simple task.
Some previous work has addressed differences between male and female subjects in performing spatially-oriented tasks. [McGee 79]. As a group, our female subjects did slightly better in all categories than our male subjects, but the differences were not significant. Also, our population was drawn from students in engineering majors, so it presumably was skewed with respect to the general population.
We asked each subject a sequence of questions after they had completed the study. When asked "Which method do you prefer?", 25 of the 28 subjects preferred head tracking. No subject reported the head-mounted display as a major problem, although we suspect that the novelty of the experience explains why so few subjects reported a problem using the head-mounted display, which is certainly cumbersome. None of the subjects reported any problem with motion sickness or discomfort during the study, although the subjects overwhelmingly stated that lag between when they moved and when the display updated as the major problem with both modes.