The USI (Universal Speech Interface) Manifesto |
Speech recognition technology has made spoken interaction with machines feasible. Shrinking form factors point to speech as a crucial component of mobile computing and communication devices. However, no universal interface has yet been proposed for humans to communicate effectively, efficiently and effortlessly with machines via speech.
On one hand, natural language interfaces (NLI) have been demonstrated in narrow domains. However, such systems require lengthy development, which is data and labor intensive, and heavy involvement by experts who meticulously craft the vocabulary, grammar and semantics for the specific domain. This hampers the widespread adoption of NLI. Furthermore, unconstrained NLI severely challenges recognition accuracy, overburdens the computational resources and/or available bandwidth, and fails to communicate to the user the limitations of the application.
On the other hand, telephone-based interactive voice response IVR systems use carefully crafted hierarchical menus which are navigated using DTMF tones or short, spoken phrases. These systems are commercially viable for some applications, but are typically loathed by users due to their inefficiency, rigidity, incompleteness and high cognitive demands. These shortcomings prevent them from being deployed more widely.
These two interaction styles are extremes along a continuum. The optimal style for human-machine speech communication arguably lies somewhere in between: more structured than natural language, yet more flexible than simple hierarchical menus. The USI project designs, implements and evaluates such interface styles. In essence, we attempt to do for speech what Palm’s Graffiti™ has done for mobile text entry.
Yet another interface alternative is specialized Command-and-Control languages. While these are viable for expert users who can invest hours in learning their chosen application, they do not scale to dozens of applications used by millions of occasional users. Our system, on the other hand, is universal -- that is, it is application independent. After spending 5 minutes learning the interface, a typical user should be able to communicate with applications as diverse as information servers, schedulers, contact managers, message services, cars and home appliances. In essence, we try to do for speech what the Macintosh universal “look and feel” has done for the GUI world.
In the USI project, we analyze human communication with a variety of devices and applications and design, implement and test universal interfaces. Each consists of a metaphor (analogous to the "desktop" metaphor of graphical user interfaces), a set of universal interaction primitives (help, navigation, confirmation, correction etc.), and a graphical component for applications which have space for a display. We conduct user studies to evaluate user acceptance and the transference of user skills across applications, and we create tools for rapid prototyping of compliant applications.