Table of Contents


Abstract

In a previous user study, we found that adding voice input to mouse input decreased task completion time for MacdrawTM users by 21%. In that study, we did not allow subjects to use keyboard accelerators. As a follow up, we now present a user study where the control group used mouse input plus keyboard accelerators and the experimental group used mouse input plus voice input. The task was the same as in the previous experiment. We found that adding voice input reduced task completion time by a larger amount (21%) than adding keyboard accelerators (10%-15%, depending on whether the subject had previously memorized the keyboard mapping). From this we conclude that voice input is a useful addition to the mouse when performing a line-art drawing task.

Introduction

If voice input is to be widely used, user-interface designers need to know under what conditions voice input increases the productivity for users of common applications. Most previous work in the area of evaluating voice input has been in "versus studies," where voice input is raced against other input modes or devices. While this is appropriate for applications where the user's hands and/or eyes are busy, these studies provide little information about the effectiveness of combining voice input with other input modalities. Our research goal is to measure how effective voice input can be when used in conjunction with other input devices/modalities.

In a previous user study [Pausch, 1990] we demonstrated that voice used in parallel with mouse input decreased task completion time for users of the popular Macintosh application MacDraw (Claris, version 1.9.6) by 21%. Our conjecture is that we reduced task completion time by reducing the amount of mouse motion required to access menu items.

Another way to reduce menu access time is to provide keyboard accelerators, sometimes called "hot keys" or "menu accelerator keys." In our previous study, we prohibited the use of keyboard accelerators. This paper presents a follow-up study where we test the hypothesis that voice input is faster than using keyboard accelerators. Keyboard accelerators reduced task completion time by 15% if the accelerators were memorized and by 10% if the accelerators were not memorized. Since adding voice input reduced task time by 21%, we conclude that voice input is a more effective input modality than keyboard accelerators, even with a relatively small set of commands.

Related Work

Voice Input as a Replacement for Keyboard Accelerators in a Mouse-Based Graphical Editor: An Empirical Study

Voice Input as a Replacement for Keyboard Accelerators in a Mouse-Based Graphical Editor:
An Empirical Study

James H. Leatherby and Randy Pausch
jhl2f@Virginia.EDU, pausch@Virginia.EDU
Computer Science Department
University of Virginia
Thornton Hall
Charlottesville, VA 22903-2442
(804) 982-2211
leatherby@virginia.edu
pausch@virginia.edu
A great deal of work has been done in evaluating speech as a means of computer input. While several interesting studies have been done to show that users will naturally use voice input in parallel with other input modes [Benbasat 1981, Biermann 1985, Bolt 1980, Hauptmann 1989], these studies provide no quantitative assessment of how much efficiency is added by the voice channel. The majority of quantitative results regarding voice input compare the efficiency, expressed as speed and error rate, of voice input against another input mode for the same task. Instead, we are interested in the question of what happens when voice is used as an additional channel.

Leggett and Williams [1984] performed one such study. Twenty-four subjects, twelve male and twelve female, used either voice or mouse input to edit program segments. All subjects were novice voice users and experienced keyboard users. The study was broken into two types of tasks: input, where the users would read written program text and enter it into the system, and edit, where the subject would make changes to program text already in the system. The voice system made use of forty vocabulary words which were trained five to six times each. Given the same time limit, the keyboard users were able to complete 70% of the task while voice input users were only able to complete 50%-55% of the task. Voice also had a lower error rate. The authors concluded that the main reason for the difference between the input modes was the inexperience of the users in using the voice equipment.

Martin [1989] performed a study using a VLSI chip designing package. Since the VLSI design system made use of single word input, key presses, and mouse clicks, it was chosen for this experiment so that speech input could be compared to a variety of other input modalities. This system would also test the overhead associated with voice input since the system is highly interactive, with between twenty and thirty commands typically executed per minute. Voice was expected to win because the subjects' hands would be free to perform other tasks. Seven subjects were used, but only four were able to contribute data. All had recently completed a graduate-level course in VLSI design. The voice system used was a discrete word recognizer with a head-mounted microphone. The voice system was configured so that it could be turned off and on with a voice command. The system had a vocabulary of one-hundred twenty words. Two types of tasks were tested: structured and design. The structured tasks consisted of two tasks given to the subject who had a fixed amount of time to complete each task. For the first task they were encouraged to use voice input; voice input was not available for the second task. For the design tasks, the subjects picked two tasks from a group of available tasks or came up with their own task. Once again, the subject was given a fixed amount of time to spend on each task. Speech input was found to be roughly equivalent to mouse clicks, but significantly better than keyboard input. When voice was used, the completion time was 24% faster than when single key presses were used and 108% faster than single-word typed commands. Overall, voice users were able to complete 62% of the tasks while for tasks without voice, only 38% of the tasks were completed. The error rate for subjects using voice input ranged between 8% and 12%.

Poock [1982] devised an experiment where voice or keyboard were used to supply commands to a computer system to perform simple tasks. Voice input was found to be 17% faster than typed input. Further, keyboard input produced significantly more errors than voice input.

Cochran, Riley, and Stewart [1980] had users enter connections between items in an electrical circuit. In this experiment, voice input took longer to perform the tasks than mouse input, but produced less errors.

Nye [1982] devised a baggage routing experiment. Users either entered a three digit destination code via the keyboard or spoke the name of the destination city. Voice input was found to be faster and produced far less errors than the manual input. Errors for speech input were about 1% while errors for typed input ranged from 10% to 40%.

Haller, Mutschler, and Voss [1984] had subjects correct simple typing mistakes and move the cursor to various positions on the screen using either voice input or keyboard input. For cursor movement, voice was found to be worse than other methods tested, i.e. light pen, graphic tablet, mouse, and cursor keys. Voice users had to speak the (x, y) co-ordinate of the new cursor position to move it. For error correction, voice was only tested against keyboard input. The keyboard was found to be slightly faster and less error prone than speech input.

Visick, Johnson, and Long [1984] performed an experiment where users sorted a deck of cards that contained names of a destination city and entered them in sorted order into the computer system. The keyboard used had one key for each destination city and it was marked with the city's name. Since voice input users did not need to use their hands for input, they could sort and provide input at the same time. Voice input decreased the amount of time required for this operation by 37%.

These experiments can only conclude if voice input was faster than an alternative input mode, such as the mouse, for their particular application domain. The varying results, i.e. sometimes voice is faster and other times it is not, further demonstrates this fact. This limitation arises because the applications used were developed for the purposes of the experiment. In order to generalize the result, widely-used applications need to be tested. The preceding studies also limit the user to using one input mode or the other. User interface designers know that in order to be practical, voice input will rarely be used as the only mode of input. To really test the practicality of voice input, real systems need to test voice input in addition to other input modes.

A small number of studies have measured the combination of voice and other input modalities. Pausch and Leatherby [1990] used voice input to augment a graphical editor. Sixteen subjects were randomly broken into two groups. The first group used mouse input only and the second group used voice input in conjunction with mouse input. Each subject used their input mode to reproduce eight simple "line art" drawings from hard copy. The voice system used was speaker dependent discrete word recognition. The system consisted of nineteen vocabulary words each was which was trained five times by the subject. Voice users had an overall speedup of 21% as compared to mouse only users. The voice recognition had an error rate of 5%.

Karl, Pettey, and Shneiderman [1992] used voice input to augment a word processor. Sixteen subjects, ten male and six female, were randomly divided into two groups. The first group used mouse input first and then repeated the trials with voice plus mouse. The second group used voice plus mouse first and then repeated the trials using mouse input only. The voice system used was speaker dependent discrete word recognition. The system consisted of eighteen vocabulary words, each of which was trained at least three times by each of the subjects prior to beginning the study. Four tasks were selected for use with the word processor. In the first task, the subject used either the voice system or the mouse to re-format a document with six pre-defined styles (no typing was required in this task.) In the second task, the subjects typed a short scientific formula which contained subscripts, superscripts, bold text, and Greek symbols. In the third task, the subjects built a table using the copy, paste, up, and down functions. In the final task, the subjects typed a short paragraph that consisted of subscripts, superscripts, italic, and bold text. Voice users were found to have a speedup of 19% over mouse only input users. The voice recognition error rate was found to be 5%.

Motivation

In a previous user study, we demonstrated that voice input used in parallel with mouse input decreased task completion time for the popular Macintosh application MacDraw. In the experiment we did not allow the users to use system defined keyboard accelerators, also known as "hot keys" or "menu accelerator keys." A common complaint from some of the users who had previous experience with the drawing package was that they were being forced to perform the task "unnaturally." Without the use of keyboard accelerators, the experienced users felt that they were being unfairly handicapped. We present the following study as an effort to investigate this claim and see if we really hindered our original subjects.

Description of the Study

Keyboard accelerators are presumably most productive for users who have already memorized the mapping of key strokes to application commands. We measured both cases: a "novice" group of users who had not memorized the keyboard commands, and an "advanced" group who had. The novices used a printed sheet listing the available keyboard accelerators. To invoke a command, they searched the printed sheet for the proper keyboard accelerator and then typed it on the keyboard. The advanced group memorized the seventeen keyboard accelerators before starting the experiment, which we confirmed by quizzing them. The "advanced" subjects were informed that if they forgot an accelerator, they could ask the experimenter for the key binding, although this did not happen during the study. We used an experimental group of sixteen subjects which we randomly divided into the novice and advanced groups. All subjects were graduate or undergraduate students at the University of Virginia; all were familiar with mouse usage and none were expert MacDraw users. No subject who took part in our previous experiment also participated in this study.

The subjects participated in two drawing sessions. In each session the subject first created a practice drawing and was then timed while creating four drawings. We used the same set of eight drawings from our previous study, so that we could compare the results. The drawings were chosen randomly from recent issues of Communications of the Association for Computing Machinery, Science, and the Journal of the American Institute of Chemical Engineers. We randomly selected drawings, instead of devising drawings specifically for the study, in order to avoid biasing the task. For each drawing, the subject started with a blank MacDraw screen and a printed copy of the artwork. The subject was allowed to study the artwork as long as desired before beginning the timed task.

The keyboard accelerators were constructed using Macro-Maker (Apple Computer Inc., version 1.0.2) for the Macintosh operating system (Apple Computer Inc., version 6.0.3). Following the standard Macintosh user-interface convention, all keyboard accelerators were invoked by holding down the "clover" key as a shift key, and then pressing a single keyboard key. The keyboard accelerators used in the study are shown in Table 1:

Some of the command names were modified from the earlier study in order to make them more mnemonic. In most cases, the letter used to activate the command is either the first letter of the command name or some letter that distinguishes the command from the others. The commands Cut, Paste, Select All, and Undo violate this convention, but were chosen to match the standard accelerators used by most Macintosh applications. Other commands that had accelerators provided by MacDraw were changed, if possible, to make them more mnemonic.

Results

Table 2 shows combined results from the earlier study and the current study. Figure 1 also gives a graphical display of the data. For six of the eight drawings, "voice plus mouse" input was faster than "voice plus accelerator key" input. We define "speedup with input method X" as

The average speedup per picture was 15% when the `advanced" group was compared to mouse input, and 13% when the "novice" group was compared to mouse input. This calculation ignores the fact that the individual pictures had a large variation in their complexity; by counting each picture's speedup equally in the average, we bias the result towards the simpler pictures. For example, a picture whose drawing time decreased from 20 seconds to 10 seconds would have a 50% reduction, and a picture whose drawing time decreased from 1000 seconds to 900 seconds would have a 10 per reduction. Computing a 30% average reduction for these two drawings is technically correct, but a better measure of time reduction is obtained by dividing the sum of the total raw time. In this example, dividing 910 by 1020 yields 90.2, or a 10.8% overall reduction in task time. When we perform this calculation, we find an overall time reduction of 15% when the "advanced" group is compared to mouse input only and 10% when the "novice" group is compared to mouse input.

There were two drawings for which voice input did not yield an increase when compared to one of the two groups. In both cases, there was a relatively large amount of text in the drawings, so the typing speed of the individuals became the dominant issue.

Discussion

We believe that keyboard accelerators take longer than voice input because they require a cognitive context switch. With keyboard accelerators, the user must perform a mapping from the command name to a key binding, whether or not he or she has memorized that binding. With voice, speaking the name of the desired command does not cause the user to perform a context switch.

For the keyboard accelerators in this study, the user needed to hold the "clover" key down while pressing another key. For most keys, this was accomplished with one hand while the user kept his or her other hand on the mouse. For some accelerators, the user needed to use both hands, which required homing to the keyboard and then back to the mouse. While shifting can be avoided by using dedicated function keys, shifted keys are the standard mechanism for the Macintosh, so we used them. We also used a very small number of accelerator keys in this study; we expect that as the number of accelerator keys grow and the key-strokes become less obvious the advantage that voice input provides will increase.

A final observation is that although we had expected memorization of the keyboard accelerators to be a large issue, it was not. The novice and advanced groups performed similarly. Although the command set contains seventeen distinct commands, only a small number of these were used frequently during the study. For the most part the novices learned these keys during the course of a single drawing.

Conclusions

Our previous study showed that when MacDraw was augmented with voice input, task completion time was reduced by 21%. In that study, the control group used keyboard and mouse, but was prohibited from using accelerator keys. This study questioned whether the speedup achieved with voice (presumably by reducing mouse travel time to menus) could also have been achieved with accelerator keys. We found that the speedup obtained via "voice plus mouse" (21%) was greater than that of "accelerator keys plus mouse," which was 15% for advanced users and 10% for novices. On the basis of this evidence, we conclude voice input provides a significant reduction in task completion time for a graphical editor when compared to the traditional alternatives provided by the Macintosh.

References

Benbasat, I., Dexter, A. S. and Masulis, P. S., An Experimental Study of the Human/Computer Interface, Communications of the Association for Computing Machinery 24, 11 (November 1981), pages 752 - 762.
Biermann, A., Rodman R., Rubin, D., and Heidlage, J., Natural Language with Discrete Speech as a Mode for Human-to-Machine Communication, Communications of the Association for Computing Machinery, 28, 28 (June 1985), pages 628 - 636.
Bolt, R., Put-That-There: Voice & Gesture at the Graphics Interface, Computer Graphics, 14, 3 (1980), pages 262 - 270.
Cochran, D., J., Riley, M. W., and Stewart, L. A., An Evaluation of the Strengths, Weaknesses, and Uses of Voice Input Devices. Proceedings of the Human Factors Society -- 24th Annual Meeting. Los Angeles, 1980
Haller, R., Mutschler, H., and Voss, M. Comparison of Input Devises for Correction of Typing Errors in Office Systems. Proceedings of INTERACT `84, First IFIP Conference on Human-Computer Interaction, London, 1984.
Hauptmann, A. Speech and Gestures for Graphic Image Manipulation, Human Factors in Computer Systems (SIGCHI), 1989, pages 241 - 245.
Karl, L, Pettey, M., and Shneiderman, B., Speech-Activated versus Mouse-Activated Commands for Word Processing Applications: An Empirical Evaluation. Currently submitted for publication. Available as a technical report from the University of Maryland Computer Science Department.
Leggett, J. and Williams G., An Empirical Investigation of Voice as an Input Modality for Computer Programming, International Journal of Man-Machine Studies 21 (1984), pages 493 - 520.
Martin, G. F. The Utility of Speech Input in User-Computer Interfaces, International Journal of Man-Machine Studies, Volume 30, 1989, pages 355 - 375.
Nye, J. M., Human Factors Analysis of Speech Recognition Systems, Speech Technology, Volume 1, pages 50 - 57, 1982/
Pausch, R. and Leatherby, J. H. "A Study Comparing Mouse-Only Input vs. Mouse-Plus-Voice Input for a Graphical Editor," Proceedings of the AVIOS `90 Voice I/O Systems Applications Conference, September 1990, pages 227 - 231.
Poock, G. K., Voice Recognition Boosts Command Terminal Throughput, Speech Technology, 1,2 (April, 1982), pages 36 - 39.
Visick, D., Johnson, P., and Long, J. The Use of Simple Speech Recognisers in Industrial Applications. Proceedings of INTERACT `84, First IFIP Conference on Human-Computer Interaction, London, 1984.

Author Biographies

Randy Pausch is an Assistant Professor of Computer Science at the University of Virginia. He received his Ph.D. in Computer Science from Carnegie-Mellon University in 1988 and his Sc.B. in Computer Science from Brown University in 1982. His research interests include human-computer interaction, computer graphics, and software architectures.

Jim Leatherby is a Master student in Computer Science at the University of Virginia, and is also a member of the technical staff at GE/Fanuc Automation, Inc. in Charlottesville, VA. He is a member of the Association for Computing Machinery, and his research interests include voice input, software engineering, and the proper construction of user studies involving human-computer interfaces.