Researchers in the field of spoken language translation are plagued by a device from popular science fiction. Numerous television series and movies, but most notably those in the “Star Trek” franchise, have assumed the existence of a Universal Translator, a device that immediately understands any language (human or alien), translates it into the other person's language (always correctly), and speaks it fluently, with appropriate prosody. While this is a very useful plot device, avoiding tedious stretches of translation and the need to invent convincing alien languages, it sets up wildly unrealistic expectations on the part of the public. In contrast, anything that is actually possible can only be a disappointment.
Of course, these stories also feature many other plot devices that stretch or violate our current understandings of science but make the storyline work better: faster-than-light travel, teleportation, intelligent aliens that happen to breathe oxygen, and so forth. Yet the public is not disappointed when, for example, it takes years for an actual spacecraft to reach another planet. Perhaps because there is no obvious violation of physics associated with the Universal Translator, it is much less obvious to most people that a true Universal Translator — a device that can translate every known language on this planet (or others for that matter) — is unlikely ever to exist, just as humans are unlikely ever to travel faster than light.
A variety of approaches to spoken language translation will be presented in this chapter, but the fundamental problem with the Universal Translator transcends any specific technology. This fundamental problem is the need to match the words of one language to the words of another language (ignoring for the moment all the other knowledge required, regarding syntax, phonology, etc.) In each language, the match between its words and their meanings is arbitrary. There is nothing intrinsic to the letters or sounds of the word “soap” that indicate that it is something you wash with, or of the word “soup” that indicate that it is a liquid you eat. (Worse still, “soap” can also refer to a particular kind of television show, and “soup” can also refer to a particular kind of fog.) For each language, one simply must know what each word can mean. Thus the match between words of different languages is also completely arbitrary. It is worth noting that statistical systems that achieve human-level quality will need to learn this information as well, whether explicitly or implicitly.
If one takes the term “universal” seriously, this arbitrary match is an insurmountable problem. There are roughly 6,000 living languages today, so a “universal” translator would need to contain detailed knowledge about all the words in all these languages (many of which lack any significant quantity of written texts). Lowering the bar substantially, to languages with at least one million native speakers, still leaves over 300 languages to deal with. Constructing a Universal Translator to handle just these 300 languages would require developing speech recognition, translation, and synthesis for each of the 300. This would clearly still be a massive undertaking.
Although the Universal Translator is not on the horizon, significant progress has been made in recent years toward the more modest goal of acceptable-quality translation between single pairs of languages, given substantial development effort on each language pair. Enough progress has been made that it is clear that useful spoken language translation will be developed in the near future, although as of this writing no such systems are in mass production or use (that the authors are aware of), in contrast to speech recognition and machine translation technologies used independently.