To study the process, we observe the behavior of the random process for some time, collecting a large number of samples . In the example we have been considering, each sample would consist of a phrase x containing the words surrounding in, together with the translation y of in which the process produced. For now we can imagine that these training samples have been generated by a human expert who was presented with a number of random phrases containing in and asked to choose a good translation for each.
We can summarize the training sample in terms of its empirical probability distribution , defined by
Typically, a particular pair will either not occur at all in the sample, or will occur at most a few times.