To the Bayesian inference framework just defined we add the fact of using neural network models described by .
The best single valued prediciton that minimizes the squared error is the mean of the predictive distribution that we can define as the mean of the network outputs corresponding to the conditional distribution.
In Monte Carlo, these integrals take the form of expectation functions
Using a sample of values from Q (posterior) we can approximate this integral by
where are generated
by a process that results in each of them having the distribution defined by Q.
We can write the posterior for the parameters and hyperparameters after "receiving" D (eq.1)as,
In MCMC we do not try to express the posterior in a direct way.
The iterative method first gives a state vector
and generate a new random state vector
from a probability
distribution
then we obtain
by sampling
from
, and so forth. The transition probability q is
constructed in such a way that an ergodic Markov process is defined with stationary distribution equal to the desired posterior distribution.
Several methods might be used to obtain the sample from the posterior distribution in the MCMC frame work: Metropolis method, Gibbs sampling, stochastic dynamics and Hybrid Monte Carlo.
The hybrid Monte Carlo method developed by Duane [2], and used in this paper is a merge of the stochastic dynamics and Metropolis algorithms.
Basically the Metropolis algorithm generates the sequence mentioned above by first generating
from
by:
generating a "candidate state" from a proposal distribution and then accepting or not based in its probability density relative to that of the old state with respect to the desired invariant distribution Q.
If
is accepted then becames next state in the chain.
If
is not accepted then the states the same as the old.
Creating a fictious variable p and time , we can use the Hamiltonian
stochastic dynamics formalism and sample q (previously called
) for a
constant energy H, sampling for different H in a second stage.
Since Hamiltonian dynamics cannot be simulated exactly the leapfrog method is used.
Putting together these two, what the Hybrid Monte Carlo algorithm does is:
1. Randomly choose if the transition goes forward on backwards in the fictious time ( ).
2. The current state
evolves after L leapfrog iterations
size to
3. Regard as candidate for next state with probability
where H can be interpreted as an error function being minimized.
Summarizing, Bayesian learning takes a distribution of the parameters
, combines it with information from a training set and
then integrates the posterior obtaining the desired forecast.
Important features are that: the results do not overfit the data, the
prediction accurancy can be improved and that the prediction intervals can
be estimated.