Conjugate Prior
Conjugate Prior.
- Bernoulli distribution and Beta prior
- Categorical distribution and Dirichlet prior
- Poisson distribution and Gamma prior
- Univariate Gaussian distribution and Normal-Gamma Priors
Bernoulli distribution and Beta prior
The likelihood function
The beta distribution is parameterized using
where the normalization factor
Consider in particular
In particular, it is a
A similar calculation (
Applying the results to
The predictive probability of a new data point
Categorical distribution and Dirichlet prior
In this case the likelihood takes the form
where
This yields the conjugate prior:
where
The posterior distribution of
which is
The mean and variance are
where
Poisson distribution and Gamma prior
Consider the Poisson distribution:
The corresponding conjugate prior retains the shape of the likelihood:
where the normalization factor
The posterior distribution of
which is
The mean and variance are readily computed as:
In the distribution
Univariate Gaussian distribution and Normal-Gamma Priors
Recall that the Gaussian distribution is a two-parameter exponential family of the following form:
Conjugacy for the mean
Inspecting the above density form we see that the exponent is the negative of a quadratic form in
where
Before deriving the posterior distribution
Consider that we only have one observation
We can now easily calculate:
Treating
We can also express the results in terms of the precision. In particular, plugging
The posterior expectation is a convex combination of the observation
Let us now consider the posterior distribution in the case of involving multiple observed data points, we have:
We rewrite the exponent using a standard trick:
The first term yields a constant factor, and we see the problem reduces to an equivalent problem involving only single random variable (treating
and we have
Now consider an unseen data point
Conjugacy for the variance
Let us now consider the Gaussian distribution with known mean
The posterior distribution:
which is an
If we derive the posetrior in terms of precision we would obtain
The predictive distribution for an unseen data point
which turns out to be a t-distribution:
Conjugacy for the mean and variance
We make the following specifications:
where the
To compute the posterior distribution
When
and
We proceed to working out the marginal posterior of
We want to integrate out
Thus integrating out
The probability density function is
In the end, we have
which is a gamma distribution