Footnotes

...entropy

A more common notation for the conditional entropy is 78#78, where Y and X are random variables with joint distribution 79#79. To emphasize the fact that H is a functional, depending on the probability distribution p, we have adopted the alternate notation 80#80.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

...Lagrangian

Ignoring the set of weak inequalities 100#100 when forming the Lagrangian doesn't change the problem, since for the solution that emerges, these constraints will not be binding anyway.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

...by

We will henceforth abbreviate 145#145 by 146#146 when the empirical distribution 147#147 is clear from context.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

...numbers,

Technically, we require the strong law of large numbers, which asserts that for all 228#228, the event inside the braces in (21) holds almost everywhere. We also require an assumption concerning the sample distribution being stationary, but discussing either of these details at length would bring us too far afield

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

Adam Berger
Fri Jul 5 11:43:50 EDT 1996