next up previous
Next: Data and Methods Up: Introduction Previous: Bayesian Learning and forecasting

 

Markov Chain Monte Carlo

To the Bayesian inference framework just defined we add the fact of using neural network models described by tex2html_wrap_inline429 .

The best single valued prediciton that minimizes the squared error is the mean of the predictive distribution that we can define as the mean of the network outputs corresponding to the conditional distribution.

displaymath431

In Monte Carlo, these integrals take the form of expectation functions

displaymath433

Using a sample of values from Q (posterior) we can approximate this integral by

displaymath435

where tex2html_wrap_inline437 are generated by a process that results in each of them having the distribution defined by Q.

We can write the posterior for the parameters and hyperparameters after "receiving" D (eq.1)as,

  equation80

In MCMC we do not try to express the posterior in a direct way. The iterative method first gives a state vector tex2html_wrap_inline439 and generate a new random state vector tex2html_wrap_inline441 from a probability distribution tex2html_wrap_inline443 then we obtain tex2html_wrap_inline445 by sampling from tex2html_wrap_inline447 , and so forth. The transition probability q is constructed in such a way that an ergodic Markov process is defined with stationary distribution equal to the desired posterior distribution.

Several methods might be used to obtain the sample from the posterior distribution in the MCMC frame work: Metropolis method, Gibbs sampling, stochastic dynamics and Hybrid Monte Carlo.

The hybrid Monte Carlo method developed by Duane [2], and used in this paper is a merge of the stochastic dynamics and Metropolis algorithms.

Basically the Metropolis algorithm generates the sequence tex2html_wrap_inline449 mentioned above by first generating tex2html_wrap_inline451 from tex2html_wrap_inline453 by: generating a "candidate state" from a proposal distribution and then accepting or not based in its probability density relative to that of the old state with respect to the desired invariant distribution Q. If tex2html_wrap_inline451 is accepted then becames next state in the chain. If tex2html_wrap_inline451 is not accepted then the states the same as the old.

Creating a fictious variable p and time tex2html_wrap_inline459 , we can use the Hamiltonian stochastic dynamics formalism and sample q (previously called tex2html_wrap_inline405 ) for a constant energy H, sampling for different H in a second stage.

Since Hamiltonian dynamics cannot be simulated exactly the leapfrog method is used.

Putting together these two, what the Hybrid Monte Carlo algorithm does is:

1. Randomly choose if the transition goes forward on backwards in the fictious time ( tex2html_wrap_inline463 ).

2. The current state tex2html_wrap_inline465 evolves after L leapfrog iterations tex2html_wrap_inline467 size to tex2html_wrap_inline469

3. Regard tex2html_wrap_inline471 as candidate for next state with probability

displaymath473

where H can be interpreted as an error function being minimized.

Summarizing, Bayesian learning takes a distribution of the parameters tex2html_wrap_inline475 , combines it with information from a training set and then integrates the posterior obtaining the desired forecast. Important features are that: the results do not overfit the data, the prediction accurancy can be improved and that the prediction intervals can be estimated.


next up previous
Next: Data and Methods Up: Introduction Previous: Bayesian Learning and forecasting

Rafael A. Calvo
Fri Apr 18 12:26:35 GMT+1000 1997