Weight Uncertainty in Neural Networks - 郑之杰的个人网站

神经网络中的权重不确定性.

paper：Weight Uncertainty in Neural Networks

本文作者提出了bayes-by-backprop方法用于衡量网络权重的不确定性（即认知不确定性）。

方法的核心思想是把网络权重$w$看作概率分布，既然真实的权重后验分布$p(w|D)$是不可解的，用一个变分分布$q(w|\theta)$来进行近似建模。

目标函数为最小化$q(w|\theta)$和真实后验$p(w|D)$之间的KL散度：

\[\begin{aligned} \mathcal{L}(\theta) &= KL[q(w|\theta)||p(w|D)] = \int q(w|\theta) \log \frac{q(w|\theta)}{p(w|D)} dw \\ &≈\int q(w|\theta) \log \frac{q(w|\theta)}{p(w)p(D|w)} dw \\ &≈ \log q(w|\theta) - \log p(w)p(D|w) \end{aligned}\]

变分分布$q(w|\theta)$通常设置为对角高斯分布，其中的每个对角元素从$\mathcal{N}(\mu_i,\sigma_i^2)$中采样。为避开参数$\sigma_i$的非负性，选择用$\rho_i$通过softplus参数化$\sigma_i=\log (1+\exp(\rho_i))$。则变分分布$q(w|\theta)$的权重为$\theta = \{ \mu_i,\rho_i \}_{i=1}^d$。

bayes-by-backprop方法的过程总结为：

从标准正态分布中随机采样：$\epsilon$~$\mathcal{N}(0,I)$
利用重参数化构造权重：$w=\mu + \log (1+\exp(\rho)) \circ \epsilon$
构造损失函数：$f(w,\theta) = \log q(w|\theta) - \log p(w)p(D|w)$
计算损失函数的梯度并更新参数$\theta$
不确定性可以通过推断时采样不同的模型参数来计算。