基于得分匹配的随机微分方程生成式建模.
扩散模型 (Diffusion Model)是一类深度生成模型。这类模型首先定义前向扩散过程的马尔科夫链 (Markov Chain),向数据中逐渐地添加随机噪声;然后学习反向扩散过程,从噪声中构造所需的数据样本。扩散模型也是一类隐变量模型,其隐变量通常具有较高的维度(与原始数据相同的维度)。
DDPM等扩散模型是离散形式的,即前向扩散和反向扩散过程都被事先划分为$T$步。本文作者指出,可以将它们理解为一个在时间上连续的变换过程,并用随机微分方程(Stochastic Differential Equation,SDE)或者概率流常微分方程(Probability flow ODE)来描述。
1. 用SDE描述扩散模型
(1)前向扩散SDE
前向扩散过程可以用SDE描述为:
\[d \mathbf{x} = \mathbf{f}_t(\mathbf{x})dt+g_t d\mathbf{w}\]其中\(\mathbf{w}\)是标准维纳过程;\(\mathbf{f}_t(\cdot )\)是一个向量函数,被称为\(\mathbf{x}(t)\)的漂移系数(drift coefficient)。\(g(\cdot )\)是一个标量函数,被称为\(\mathbf{x}(t)\)的扩散系数(diffusion coefficient)。
前向扩散SDE也可以等价地写成以下差分方程的形式:
\[\mathbf{x}_{t+\Delta t}-\mathbf{x}_t = \mathbf{f}_t(\mathbf{x}_t)\Delta t + g_t \sqrt{\Delta t} \boldsymbol{\epsilon},\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})\]或写作概率模型:
\[\begin{aligned} p(\mathbf{x}_{t+\Delta t} \mid \mathbf{x}_t) &= \mathcal{N}\left(\mathbf{x}_{t+\Delta t};\mathbf{x}_t+\mathbf{f}_t(\mathbf{x}_t)\Delta t, g_t^2 \Delta t\mathbf{I}\right) \\ & \propto \exp\left( -\frac{||\mathbf{x}_{t+\Delta t}-\mathbf{x}_t-\mathbf{f}_t(\mathbf{x}_t)\Delta t||^2}{2g_t^2 \Delta t} \right) \end{aligned}\](2)反向扩散SDE
反向扩散SDE旨在求解\(p(\mathbf{x}_{t} \mid \mathbf{x}_{t+\Delta t})\)。根据贝叶斯定理:
\[\begin{aligned} p(\mathbf{x}_{t} \mid \mathbf{x}_{t+\Delta t}) &= \frac{p(\mathbf{x}_{t+\Delta t} \mid \mathbf{x}_{t})p(\mathbf{x}_{t})}{p(\mathbf{x}_{t+\Delta t})} = p(\mathbf{x}_{t+\Delta t} \mid \mathbf{x}_{t}) \exp \left( \log p(\mathbf{x}_{t}) - \log p(\mathbf{x}_{t+\Delta t}) \right) \\ &\propto \exp\left( -\frac{||\mathbf{x}_{t+\Delta t}-\mathbf{x}_t-\mathbf{f}_t(\mathbf{x}_t)\Delta t||^2}{2g_t^2 \Delta t} + \log p(\mathbf{x}_{t}) - \log p(\mathbf{x}_{t+\Delta t})\right) \end{aligned}\]通常$\Delta t$比较小,因此有泰勒展开:
\[\log p(\mathbf{x}_{t+\Delta t}) = \log p(\mathbf{x}_{t}) + (\mathbf{x}_{t+\Delta t}-\mathbf{x}_t) \cdot \nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t) + \mathcal{O}(\Delta t)\]代入上式并配方得:
\[\begin{aligned} p(\mathbf{x}_{t} \mid \mathbf{x}_{t+\Delta t}) &\propto \exp\left( -\frac{||\mathbf{x}_{t+\Delta t}-\mathbf{x}_t-[\mathbf{f}_t(\mathbf{x}_t)-g_t^2\nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t)]\Delta t||^2}{2g_t^2 \Delta t} + \mathcal{O}(\Delta t)\right) \\ & \approx \exp\left( -\frac{||\mathbf{x}_t-\mathbf{x}_{t+\Delta t}+[\mathbf{f}_t(\mathbf{x}_t)-g_t^2\nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t)]\Delta t||^2}{2g_t^2 \Delta t}\right) \\ & \sim \mathcal{N}\left(\mathbf{x}_t;\mathbf{x}_{t+\Delta t}-[\mathbf{f}_t(\mathbf{x}_t)-g_t^2\nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t)]\Delta t, g_t^2 \Delta t\mathbf{I}\right) \end{aligned}\]上式也可写作差分方程:
\[\mathbf{x}_t-\mathbf{x}_{t+\Delta t} = -[\mathbf{f}_{t+\Delta t}(\mathbf{x}_{t+\Delta t})-g_{t+\Delta t}^2\nabla_{\mathbf{x}_{t+\Delta t}} \log p(\mathbf{x}_{t+\Delta t})]\Delta t + g_{t+\Delta t} \sqrt{\Delta t} \boldsymbol{\epsilon},\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})\]取$\Delta t \to 0$,得到反向过程对应的SDE:
\[d \mathbf{x} = [\mathbf{f}_t(\mathbf{x})-g_t^2\nabla_{\mathbf{x}} \log p_t(\mathbf{x})]dt+g_t d\mathbf{w}\](3)得分匹配
前向和反向扩散过程的SDE:
\[\begin{aligned} d \mathbf{x} &= \mathbf{f}_t(\mathbf{x})dt+g_t d\mathbf{w} \\ d \mathbf{x} &= [\mathbf{f}_t(\mathbf{x})-g_t^2\nabla_{\mathbf{x}} \log p_t(\mathbf{x})]dt+g_t d\mathbf{w} \end{aligned}\]也可以等价地写成差分形式:
\[\begin{aligned} \mathbf{x}_{t+\Delta t}-\mathbf{x}_t &= \mathbf{f}_t(\mathbf{x}_t)\Delta t + g_t \sqrt{\Delta t} \boldsymbol{\epsilon},\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}) \\ \mathbf{x}_t-\mathbf{x}_{t+\Delta t} &= -[\mathbf{f}_{t+\Delta t}(\mathbf{x}_{t+\Delta t})-g_{t+\Delta t}^2\nabla_{\mathbf{x}_{t+\Delta t}} \log p(\mathbf{x}_{t+\Delta t})]\Delta t + g_{t+\Delta t} \sqrt{\Delta t} \boldsymbol{\epsilon},\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}) \end{aligned}\]如果进一步知道\(\nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t)\),就可以通过反向SDE完成生成过程。
考虑到在离散型的扩散模型中,通常会为\(p(\mathbf{x}_t \mid \mathbf{x}_0)\)设计具有解析解的形式。此时\(\nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t)\)表示为:
\[\begin{aligned} p(\mathbf{x}_t) &= \mathbb{E}_{\mathbf{x}_0} \left[ p(\mathbf{x}_t \mid \mathbf{x}_0)\right] \\ \nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t) &= \frac{\mathbb{E}_{\mathbf{x}_0} \left[ \nabla_{\mathbf{x}_t} p(\mathbf{x}_t \mid \mathbf{x}_0)\right]}{ \mathbb{E}_{\mathbf{x}_0} \left[ p(\mathbf{x}_t \mid \mathbf{x}_0)\right]} \\ &= \frac{\mathbb{E}_{\mathbf{x}_0} \left[p(\mathbf{x}_t \mid \mathbf{x}_0) \nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t \mid \mathbf{x}_0)\right]}{ \mathbb{E}_{\mathbf{x}_0} \left[ p(\mathbf{x}_t \mid \mathbf{x}_0)\right]} \end{aligned}\]上式表示\(\nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t)\)计算为\(\nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t \mid \mathbf{x}_0)\)在分布\(p(\mathbf{x}_t \mid \mathbf{x}_0)\)上的加权平均。
我们希望用神经网络学一个函数\(s_θ(\mathbf{x}_t,t)\),使得它能够直接计算\(\nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t)\)。则\(s_θ(\mathbf{x}_t,t)\)应当也能表示为\(\nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t \mid \mathbf{x}_0)\)在分布\(p(\mathbf{x}_t \mid \mathbf{x}_0)\)上的加权平均,或者等价地写成如下损失:
\[\begin{aligned} & \frac{\mathbb{E}_{\mathbf{x}_0} \left[p(\mathbf{x}_t \mid \mathbf{x}_0)|| s_θ(\mathbf{x}_t,t)- \nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t \mid \mathbf{x}_0)||^2\right]}{ \mathbb{E}_{\mathbf{x}_0} \left[ p(\mathbf{x}_t \mid \mathbf{x}_0)\right]} \\ \propto & \int \mathbb{E}_{\mathbf{x}_0} \left[p(\mathbf{x}_t \mid \mathbf{x}_0)|| s_θ(\mathbf{x}_t,t)- \nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t \mid \mathbf{x}_0)||^2\right] d\mathbf{x}_t \\ = & \mathbb{E}_{\mathbf{x}_0,\mathbf{x}_t \sim p(\mathbf{x}_t \mid \mathbf{x}_0)p(\mathbf{x}_0)} \left[|| s_θ(\mathbf{x}_t,t)- \nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t \mid \mathbf{x}_0)||^2\right] \end{aligned}\]上式被称为(条件)得分匹配 (score matching)损失。
(4)连续型扩散模型的一般流程
构造连续型扩散模型的一般流程:
① 通过随机微分方程定义前向扩散过程:
\[\begin{aligned} d \mathbf{x} = \mathbf{f}_t(\mathbf{x})dt+g_t d\mathbf{w} \end{aligned}\]② 求\(p(\mathbf{x}_t \mid \mathbf{x}_0)\)的表达式;
③ 通过得分匹配损失训练\(s_θ(\mathbf{x}_t,t)\):
\[\begin{aligned} \mathbb{E}_{\mathbf{x}_0,\mathbf{x}_t \sim p(\mathbf{x}_t \mid \mathbf{x}_0)p(\mathbf{x}_0)} \left[|| s_θ(\mathbf{x}_t,t)- \nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t \mid \mathbf{x}_0)||^2\right] \end{aligned}\]④ 通过随机微分方程实现反向扩散过程:
\[\begin{aligned} d \mathbf{x} = [\mathbf{f}_t(\mathbf{x})-g_t^2s_θ(\mathbf{x}_t,t)]dt+g_t d\mathbf{w} \end{aligned}\](5)一些例子
在实践中,可以先定义\(p(\mathbf{x}_t \mid \mathbf{x}_0)\)的表达式,在反推对应的SDE。不妨假设:
\[\begin{array}{rlr} p\left(\mathbf{x}_t \mid \mathbf{x}_0\right) =\mathcal{N}\left(\mathbf{x}_t ; \sqrt{\bar{\alpha}_t} \mathbf{x}_0,\bar{\beta}_t \mathbf{I}\right) \end{array}\]即:
\[\begin{array}{l} \mathbf{x}_t = \sqrt{\bar{\alpha}_t} \mathbf{x}_0+\sqrt{\bar{\beta}_t }\boldsymbol{\epsilon},\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}) \\ \mathbf{x}_{t+\Delta t} = \sqrt{\bar{\alpha}_{t+\Delta t}} \mathbf{x}_0+\sqrt{\bar{\beta}_{t+\Delta t} }\boldsymbol{\epsilon},\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}) \end{array}\]另一方面,在前向SDE中考虑线性解($f(x)=x$):
\[\begin{aligned} d \mathbf{x} = \mathbf{f}_t\mathbf{x}dt+g_t d\mathbf{w} \end{aligned}\]或者写作:
\[\begin{aligned} \mathbf{x}_{t+\Delta t} &= (1+\mathbf{f}_t\Delta t)\mathbf{x}_t + g_t \sqrt{\Delta t} \boldsymbol{\epsilon}_1 \\ &= (1+\mathbf{f}_t\Delta t)\left(\sqrt{\bar{\alpha}_t} \mathbf{x}_0+\sqrt{\bar{\beta}_t }\boldsymbol{\epsilon}_2\right) + g_t \sqrt{\Delta t} \boldsymbol{\epsilon}_1 \\ &= (1+\mathbf{f}_t\Delta t)\sqrt{\bar{\alpha}_t} \mathbf{x}_0+\left(1+\mathbf{f}_t\Delta t\right)\sqrt{\bar{\beta}_t }\boldsymbol{\epsilon}_2 + g_t \sqrt{\Delta t} \boldsymbol{\epsilon}_1 \end{aligned}\]联立得:
\[\begin{aligned} \sqrt{\bar{\alpha}_{t+\Delta t}} &= (1+\mathbf{f}_t\Delta t)\sqrt{\bar{\alpha}_t}\\ \bar{\beta}_{t+\Delta t} &= \left(1+\mathbf{f}_t\Delta t\right)^2\bar{\beta}_t +g_t^2 \Delta t \end{aligned}\]令$\Delta t \to 0$,可解得:
\[\begin{aligned} \mathbf{f}_t &= \frac{d}{dt}\left(\ln \sqrt{\bar{\alpha}_t}\right) =\frac{1}{\sqrt{\bar{\alpha}_t}} \frac{d\sqrt{\bar{\alpha}_t}}{dt}\\ g_t^2 &= \bar{\alpha}_t\frac{d}{dt}\left(\frac{\bar{\beta}_t}{\bar{\alpha}_t}\right) \end{aligned}\]则前向SDE及其差分形式写作:
\[\begin{aligned} d \mathbf{x} &= \frac{1}{\sqrt{\bar{\alpha}_t}} \frac{d\sqrt{\bar{\alpha}_t}}{dt}\mathbf{x}dt+\sqrt{\bar{\alpha}_t\frac{d}{dt}\left(\frac{\bar{\beta}_t}{\bar{\alpha}_t}\right)} d\mathbf{w} \\ \mathbf{x}_{t+1} &= \left(1+\frac{1}{\sqrt{\bar{\alpha}_t}} \frac{d\sqrt{\bar{\alpha}_t}}{dt}\right)\mathbf{x}_t + \sqrt{\bar{\alpha}_t\frac{d}{dt}\left(\frac{\bar{\beta}_t}{\bar{\alpha}_t}\right)} \boldsymbol{\epsilon} \end{aligned}\]取\(\bar{\alpha}_t≡1\)时,结果就是论文中的VE-SDE(Variance Exploding SDE);对应的前向SDE及其差分形式为:
\[\begin{aligned} d \mathbf{x} &= \sqrt{\frac{d\bar{\beta}_t}{dt}} d\mathbf{w} \\ \mathbf{x}_{t+1} &= \mathbf{x}_t + \sqrt{\bar{\beta}_{t+1}-\bar{\beta}_t} \boldsymbol{\epsilon} \end{aligned}\]而如果取\(\bar{\alpha}_t+\bar{\beta}_t=1\)时,结果就是原论文中的VP-SDE(Variance Preserving SDE);令\(\bar{\gamma}_t=\bar{\alpha}_t\frac{d}{dt}\left(\frac{\bar{\beta}_t}{\bar{\alpha}_t}\right)\),对应的前向SDE及其差分形式为:
\[\begin{aligned} d \mathbf{x} &= -\frac{\bar{\gamma}_t}{2} \mathbf{x}dt+\sqrt{\bar{\gamma}_t} d\mathbf{w} \\ \mathbf{x}_{t+1} &= \sqrt{1-\bar{\gamma}_t}\mathbf{x}_t + \sqrt{\bar{\gamma}_t} \boldsymbol{\epsilon} \end{aligned}\]对于损失函数,首先计算:
\[\begin{aligned} \nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t \mid \mathbf{x}_0) &= \nabla_{\mathbf{x}_t} \log \frac{1}{\sqrt{2\pi \bar{\beta}_t} }\exp\left( -\frac{||\mathbf{x}_t - \sqrt{\bar{\alpha}_t} \mathbf{x}_0||^2}{2\bar{\beta}_t} \right) \\ & \propto -\frac{\mathbf{x}_t - \sqrt{\bar{\alpha}_t} \mathbf{x}_0}{\bar{\beta}_t} = -\frac{\boldsymbol{\epsilon}}{\sqrt{\bar{\beta}_t}} \end{aligned}\]不妨设$s_θ(\mathbf{x}_t,t)=-\frac{\boldsymbol{\epsilon}_θ(\mathbf{x}_t,t)}{\sqrt{\bar{\beta}_t}}$,则损失函数表示为:
\[\begin{aligned} & \mathbb{E}_{\mathbf{x}_0,\mathbf{x}_t \sim p(\mathbf{x}_t \mid \mathbf{x}_0)p(\mathbf{x}_0)} \left[|| s_θ(\mathbf{x}_t,t)- \nabla_{\mathbf{x}_t} \log p(\mathbf{x}_t \mid \mathbf{x}_0)||^2\right] \\ & = \mathbb{E}_{\mathbf{x}_t \sim p(\mathbf{x}_t \mid \mathbf{x}_0),\boldsymbol{\epsilon}\sim \mathcal{N}(\mathbf{0}, \mathbf{I})} \left[|| -\frac{\boldsymbol{\epsilon}_θ(\mathbf{x}_t,t)}{\sqrt{\bar{\beta}_t}}+\frac{\boldsymbol{\epsilon}}{\sqrt{\bar{\beta}_t}}||^2\right] \\ & = \frac{1}{\bar{\beta}_t}\mathbb{E}_{\mathbf{x}_0 \sim p(\mathbf{x}_0),\boldsymbol{\epsilon}\sim \mathcal{N}(\mathbf{0}, \mathbf{I})} \left[|| \boldsymbol{\epsilon}_θ(\sqrt{\bar{\alpha}_t} \mathbf{x}_0+\sqrt{\bar{\beta}_t }\boldsymbol{\epsilon},t)-\boldsymbol{\epsilon}||^2\right] \end{aligned}\]上式等价于为DDPM的损失函数。
2. 用概率流ODE描述扩散模型
观察前向扩散SDE的差分形式:
\[\mathbf{x}_{t+\Delta t}=\mathbf{x}_t + \mathbf{f}_t(\mathbf{x}_t)\Delta t + g_t \sqrt{\Delta t} \boldsymbol{\epsilon},\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})\]构造Dirac函数并进行泰勒展开:
\[\begin{aligned} & \delta(\mathbf{x} - \mathbf{x}_{t+\Delta t}) \\ = & \delta\left(\mathbf{x} - \mathbf{x}_t - \mathbf{f}_t(\mathbf{x}_t)\Delta t - g_t \sqrt{\Delta t} \boldsymbol{\epsilon}\right) \\ \approx & \delta\left(\mathbf{x} - \mathbf{x}_t\right) -\left( \mathbf{f}_t(\mathbf{x}_t)\Delta t + g_t \sqrt{\Delta t} \boldsymbol{\epsilon}\right) \cdot \nabla_\mathbf{x} \delta\left(\mathbf{x} - \mathbf{x}_t\right) \\ & +\frac{1}{2} \left( g_t \sqrt{\Delta t} \boldsymbol{\epsilon} \cdot \nabla_\mathbf{x}\right)^2 \delta\left(\mathbf{x} - \mathbf{x}_t\right) + \mathcal{O}(\Delta t^{3/2}) \end{aligned}\]对上式两边求期望,左端表示为:
\[\begin{aligned} \mathbb{E}_{\mathbf{x}_{t+\Delta t}} [\delta(\mathbf{x} - \mathbf{x}_{t+\Delta t})] = & \int p(\mathbf{x}_{t+\Delta t}) \delta(\mathbf{x} - \mathbf{x}_{t+\Delta t})d\mathbf{x}_{t+\Delta t} \\ = & p_{\mathbf{x}_{t+\Delta t}}(\mathbf{x}) \end{aligned}\]右端表示为:
\[\begin{aligned} \mathbb{E}_{\mathbf{x}_{t+\Delta t}} [& \delta\left(\mathbf{x} - \mathbf{x}_t\right) -\left( \mathbf{f}_t(\mathbf{x}_t)\Delta t + g_t \sqrt{\Delta t} \boldsymbol{\epsilon}\right) \cdot \nabla_\mathbf{x} \delta\left(\mathbf{x} - \mathbf{x}_t\right) \\ & +\frac{1}{2} \left( g_t \sqrt{\Delta t} \boldsymbol{\epsilon} \cdot \nabla_\mathbf{x}\right)^2 \delta\left(\mathbf{x} - \mathbf{x}_t\right) + \mathcal{O}(\Delta t^{3/2}) ] \\ \approx \mathbb{E}_{\mathbf{x}_{t},\boldsymbol{\epsilon}} [& \delta\left(\mathbf{x} - \mathbf{x}_t\right) -\left( \mathbf{f}_t(\mathbf{x}_t)\Delta t + g_t \sqrt{\Delta t} \boldsymbol{\epsilon}\right) \cdot \nabla_\mathbf{x} \delta\left(\mathbf{x} - \mathbf{x}_t\right) \\ & +\frac{1}{2} \left( g_t \sqrt{\Delta t} \boldsymbol{\epsilon} \cdot \nabla_\mathbf{x}\right)^2 \delta\left(\mathbf{x} - \mathbf{x}_t\right) ] \\ = \mathbb{E}_{\mathbf{x}_{t}} [& \delta\left(\mathbf{x} - \mathbf{x}_t\right) -\mathbf{f}_t(\mathbf{x}_t)\Delta t \cdot \nabla_\mathbf{x} \delta\left(\mathbf{x} - \mathbf{x}_t\right) +\frac{1}{2} g_t^2 \Delta t \nabla_\mathbf{x} \cdot \nabla_\mathbf{x} \delta\left(\mathbf{x} - \mathbf{x}_t\right) ] \\ = p_t(\mathbf{x}&) - \nabla_\mathbf{x} \cdot [\mathbf{f}_t(\mathbf{x})\Delta t p_t(\mathbf{x})] + \frac{1}{2} g_t^2 \Delta t \nabla_\mathbf{x} \cdot \nabla_\mathbf{x}p_t(\mathbf{x}) \end{aligned}\]令$\Delta t \to 0$,可解得:
\[\frac{\partial}{\partial t} p_{\mathbf{x}_t}(\mathbf{x}) =- \nabla_\mathbf{x} \cdot [\mathbf{f}_t(\mathbf{x}) p_t(\mathbf{x})] + \frac{1}{2} g_t^2 \nabla_\mathbf{x} \cdot \nabla_\mathbf{x}p_t(\mathbf{x})\]上式称为Fokker-Planck方程,是描述边际分布的偏微分方程。做如下变量替换:
\[\begin{aligned} \mathbf{f}_t(\mathbf{x}) &\leftarrow \mathbf{f}_t(\mathbf{x})-\frac{1}{2}(g_t^2-\sigma_t^2)\nabla_\mathbf{x}\log p_t(\mathbf{x}) \\ g_t &\leftarrow \sigma_t \end{aligned}\]则FP方程等价于:
\[\begin{aligned} \frac{\partial}{\partial t} p_{\mathbf{x}_t}(\mathbf{x}) =&- \nabla_\mathbf{x} \cdot \left[ \left(\mathbf{f}_t(\mathbf{x}) -\frac{1}{2}(g_t^2-\sigma_t^2)\nabla_\mathbf{x} \log p_t(\mathbf{x})\right) p_t(\mathbf{x})\right] \\ & + \frac{1}{2} \sigma_t^2 \nabla_\mathbf{x} \cdot \nabla_\mathbf{x}p_t(\mathbf{x}) \end{aligned}\]又注意到原始的FP方程对应SDE方程:
\[d \mathbf{x} = \mathbf{f}_t(\mathbf{x})dt+g_t d\mathbf{w}\]则上述SDE方程具有等价形式:
\[d \mathbf{x} = \left(\mathbf{f}_t(\mathbf{x}) -\frac{1}{2}(g_t^2-\sigma_t^2)\nabla_\mathbf{x} \log p_t(\mathbf{x})\right)dt+\sigma_t d\mathbf{w}\]上式表示在前向扩散过程中引入了方差$\sigma_t$,特别地,若考虑$\sigma_0$,则SDE退化为ODE:
\[d \mathbf{x} = \left(\mathbf{f}_t(\mathbf{x}) -\frac{1}{2}g_t^2\nabla_\mathbf{x} \log p_t(\mathbf{x})\right)dt\]该ODE称为概率流ODE(Probability flow ODE),其中\(\nabla_\mathbf{x} \log p_t(\mathbf{x})\)需要用神经网络近似。
前向扩散过程用一个ODE描述,此时传播过程不带噪声,从$x_0$到$x_T$是一个确定性变换,所以直接反向求解ODE就能得到由$x_T$变换为$x_0$的逆变换。
⚪ 一个例子
采用与SDE中相同的假设,即:
\[\begin{aligned} \mathbf{f}_t &= \frac{1}{\sqrt{\bar{\alpha}_t}} \frac{d\sqrt{\bar{\alpha}_t}}{dt}\\ g_t^2 &= \bar{\alpha}_t\frac{d}{dt}\left(\frac{\bar{\beta}_t}{\bar{\alpha}_t}\right) \\ \nabla_{\mathbf{x}} \log p_t (\mathbf{x}) &= -\frac{\boldsymbol{\epsilon}(\mathbf{x},t)}{\sqrt{\bar{\beta}_t}} \end{aligned}\]代入ODE方程得:
\[d \mathbf{x} = \left(\frac{1}{\sqrt{\bar{\alpha}_t}} \frac{d\sqrt{\bar{\alpha}_t}}{dt}\mathbf{x} +\frac{1}{2}\bar{\alpha}_t\frac{d}{dt}\left(\frac{\bar{\beta}_t}{\bar{\alpha}_t}\right)\frac{\boldsymbol{\epsilon}(\mathbf{x},t)}{\sqrt{\bar{\beta}_t}}\right)dt\]或写作差分形式:
\[\begin{aligned} \mathbf{x}_{t}-\mathbf{x}_{t-1} &= \frac{1}{\sqrt{\bar{\alpha}_t}} (\sqrt{\bar{\alpha}_t}-\sqrt{\bar{\alpha}_{t-1}}) \mathbf{x}_{t} +\frac{1}{2}\bar{\alpha}_t\left(\frac{\bar{\beta}_t}{\bar{\alpha}_t}-\frac{\bar{\beta}_{t-1}}{\bar{\alpha}_{t-1}}\right)\frac{\boldsymbol{\epsilon}(\mathbf{x}_t,t)}{\sqrt{\bar{\beta}_t}} \\ \mathbf{x}_{t-1} &= \frac{\sqrt{\bar{\alpha}_{t-1}}}{\sqrt{\bar{\alpha}_t}}\mathbf{x}_{t} +\frac{\bar{\alpha}_t}{2\sqrt{\bar{\beta}_t}}\left(\frac{\bar{\beta}_{t-1}}{\bar{\alpha}_{t-1}}-\frac{\bar{\beta}_t}{\bar{\alpha}_t}\right) \boldsymbol{\epsilon}(\mathbf{x}_t,t) \end{aligned}\]适当参数化后,上式等价于DDIM的采样过程:
\[\begin{aligned} \mathbf{x}_{t-1} &= \frac{1}{\sqrt{\alpha_t}}\mathbf{x}_{t}+\left( \sqrt{1-\bar{\alpha}_{t-1}}-\frac{\sqrt{1-\bar{\alpha}_t}}{\sqrt{\alpha_t}} \right) \boldsymbol{\epsilon}_\theta\left(\mathbf{x}_t, t\right) \end{aligned}\]