Chapter 16 What’s Hard About Sampling?
16.1 Bayesian vs Frequentist Inference
Frequentist Modeling: In non-Bayesian probabilistic modeling we typically perform maximum likelihood inference – that is, solve an optimization problem.
Relatively speaking, optimization problems are much easier to solve:
- (Differentiation) we have
autograd
- (Solving for Stationary Points) we have gradient descent to solve for stationary points
Bayesian Modeling: Bayesian modeling is, in comparison, orders of magnitude more mathematically and computationally challenging:
- (Inference) sampling (from priors, posteriors, prior predictives and posterior predictives) is hard, especially for non-conjugate likelihood, prior pairs!
- (Evaluation) computing integrals (for the log-likelihood of train and test data under the posterior) is intractable, when likelihoods and priors are not conjugate!
16.2 What is Sampling and Why do We Care?
- We first need to generating truly random iid sequence of numbers.
- (Uniform Distribution) Linear Congruence
- We use simple distributions (that we know how to sample from) to help us sample from complex target distributions.
- (Inverse CDF) Requires us to compute integrals (i.e. know the CDF)
- (Rejection Sampling) Uses simple distributions to “propose samples” and uses the complex target distribution to accept or rejct the samples’
- (MCMC Samplers) Uses “local rejection sampling”.
- We need to evaluate samplers.
- (Correctness) There is no way to verify that a sampling algorithm is correct except through proof
- (Efficiency) Correctness is useless if it takes too many iterations to produce one valid sample
- Everything requires Sampling!
- (Integration Involves Sampling) In ML, intractable integrals are approximated via sampling, this is called Monte Carlo Integration.
- (Sampling Can Help Optimization) In ML, we can sometimes find better optima for non-convex optimization problems by incorporating sampling into our gradient descent. This is called simulated annealing.
- Sometimes sampling requires optimization!
(Variational Inference) In ML, we often choose to approximate a complex distribution with a simple one (e.g. Gaussian), rather than sample directly from the target.
Finding the best simple approximation of a complex target distribution is called variational inference and it’s an optimization problem!
(Hamiltonian Monte Carlo Sampling) Sometimes incorporating gradient descent during sampling can help increase the efficiency of our samplers – i.e. speed up the time we need to wait for a valid sample. These methods are called Hamiltonian Monte Carlo Samplers.