Notation: Let \(\mathbf{x}=(x_1, \ldots, x_n)'\) be observed data, \(\mathbf{Z}=(Z_1, \ldots, Z_m)'\) be unobserved variables (e.g., unobserved cluster membership indicators, model parameters, etc.). If there exists other hyperparameters \(\alpha\), we assume them fixed for now.
Problem: Calculate the posterior distribution \[p(\mathbf{Z}\mid \mathbf{x},\alpha) = \frac{p(\mathbf{Z}, \mathbf{x}\mid \alpha)}{\int_{\mathbf{Z}}p(\mathbf{Z}, \mathbf{x}\mid \alpha)\mathrm{d}\boldsymbol{Z}},\] which is hard for complex likelihood and priors
One Solution: Approximate the posterior using a simpler distribution, which is the closest to the actual posterior in a computationally feasible family of distributions. (How to pick such a family? Given this family, how to obtain the "closest" one?)
- Pick a family of distributions over the unobserved (latent) variables \(\mathbf{Z}\), indexed by variational parameters(\(\mathbf{\nu}\)): \[q(Z_1, Z_2, \ldots, Z_m\mid \mathbf{\nu})\]
- Find the value(s) of \(\mathbf{\nu}\) that best approximates the posterior of interest
- Remark: we are approximating a distribution given the data \(\mathbf{x}\) at hand: \(p(\mathbf{Z}\mid \mathbf{X}=\mathbf{x})\), not all the conditional distributions \(\{p(\mathbf{Z}\mid \mathbf{X}=\mathbf{x})\}_{\mathbf{x}\in \mathcal{X}}\)