Markov Chain Monte Carlo

Next: Data and Methods Up: Introduction Previous: Bayesian Learning and forecasting

Markov Chain Monte Carlo

To the Bayesian inference framework just defined we add the fact of using neural network models described by .

The best single valued prediciton that minimizes the squared error is the mean of the predictive distribution that we can define as the mean of the network outputs corresponding to the conditional distribution.

In Monte Carlo, these integrals take the form of expectation functions

Using a sample of values from Q (posterior) we can approximate this integral by

where are generated by a process that results in each of them having the distribution defined by Q.

We can write the posterior for the parameters and hyperparameters after "receiving" D (eq.1)as,

In MCMC we do not try to express the posterior in a direct way. The iterative method first gives a state vector and generate a new random state vector from a probability distribution then we obtain by sampling from , and so forth. The transition probability q is constructed in such a way that an ergodic Markov process is defined with stationary distribution equal to the desired posterior distribution.

Several methods might be used to obtain the sample from the posterior distribution in the MCMC frame work: Metropolis method, Gibbs sampling, stochastic dynamics and Hybrid Monte Carlo.

The hybrid Monte Carlo method developed by Duane [2], and used in this paper is a merge of the stochastic dynamics and Metropolis algorithms.

Basically the Metropolis algorithm generates the sequence mentioned above by first generating from by: generating a "candidate state" from a proposal distribution and then accepting or not based in its probability density relative to that of the old state with respect to the desired invariant distribution Q. If is accepted then becames next state in the chain. If is not accepted then the states the same as the old.

Creating a fictious variable p and time , we can use the Hamiltonian stochastic dynamics formalism and sample q (previously called ) for a constant energy H, sampling for different H in a second stage.

Since Hamiltonian dynamics cannot be simulated exactly the leapfrog method is used.

Putting together these two, what the Hybrid Monte Carlo algorithm does is:

1. Randomly choose if the transition goes forward on backwards in the fictious time ( ).

2. The current state evolves after L leapfrog iterations size to

3. Regard as candidate for next state with probability

where H can be interpreted as an error function being minimized.

Summarizing, Bayesian learning takes a distribution of the parameters , combines it with information from a training set and then integrates the posterior obtaining the desired forecast. Important features are that: the results do not overfit the data, the prediction accurancy can be improved and that the prediction intervals can be estimated.

Next: Data and Methods Up: Introduction Previous: Bayesian Learning and forecasting

Rafael A. Calvo
Fri Apr 18 12:26:35 GMT+1000 1997