We consider a random process which produces an output value y, a member of a finite set . For the translation example just considered, the process generates a translation of the word in, and the output y can be any word in the set {dans, en, à, au cours de, pendant}. In generating y, the process may be influenced by some contextual information x, a member of a finite set . In the present example, this information could include the words in the English sentence surrounding in.
Our task is to construct a stochastic model that accurately represents the behavior of the random process. Such a model is a method of estimating the conditional probability that, given a context x, the process will output y.
A word here on notation: a rigorous protocol requires that we differentiate a random variable from a particular value it may assume. One approach is to write a capital letter for the first and lowercase for the second: X is the random variable (in the case of a six-sided die, ), and x is a particular value assumed by X. Furthermore, we should distinguish a probability distribution, say , ( is appropriate for a fair die) from a particular value assigned by the distribution to a certain event, say . Having conceded what we should do, we shall henceforth (when appropriate) dispense with the capitalized letters and let the context disambiguate the meaning of : an entire model or the value assigned by the model to the event X=x. Furthermore, we will denote by the set of all conditional probability distributions. Thus a model is, by definition, just an element of .