Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

Cold Diffusion：反转任意无噪声的图像变换.

paper：Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

构建一个扩散模型包含“前向过程”、“反向过程”、“训练目标”三个部分。本文作者设计了Cold Diffusion，着重于使用任意（无噪声的）变换来构建一般的前向过程。

⚪ 前向过程

前向过程是指$q\left(\mathbf{x}_t \mid \mathbf{x}_{0}\right)$，即通过输入图像$\mathbf{x}_{0}$构造$t$时刻的噪声图像$\mathbf{x}_t$。$q\left(\mathbf{x}_t \mid \mathbf{x}_{0}\right)$的主要作用是用来构建扩散模型的训练数据，因此最基本要求是便于采样。

Cold Diffusion通过确定性的变换$\mathbf{x}_t = \mathcal{F}_t (\mathbf{x}_{0})$构建前向过程。此处$\mathcal{F}_t$是关于$t,\mathbf{x}_{0}$的确定性函数，可以是对原始数据的任意破坏方式，对于图像来说有模糊、遮掩、池化等。

为了方便后面的分析，引入更一般的前向过程:

\[\mathbf{x}_t = \mathcal{F}_t (\mathbf{x}_{0}) + \sigma \mathbf{\epsilon}, \quad \mathbf{\epsilon} \sim p(\mathbf{\epsilon})\]

其中$ε$是采样自某个标准分布$p(ε)$的随机变量，常见选择是标准正态分布。如果需要确定性的变换，只需取$σ→0$即可。

一般情况下，上式唯一的限制是$t$越小，$\mathcal{F}_t (\mathbf{x}_{0})$所包含的$\mathbf{x}_{0}$的信息越完整，用$\mathcal{F}_t (\mathbf{x}_{0})$重构$\mathbf{x}_{0}$越容易；反之$t$越大重构就越困难，直到某个上界$T$时，$\mathcal{F}_t (\mathbf{x}_{0})$所包含的$\mathbf{x}_{0}$的信息几乎消失，重构几乎不能完成。

⚪ 反向过程

扩散模型的反向过程是通过多步迭代来逐渐生成逼真的数据，其关键是设计概率分布$q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right)$。一般地，有：

\[q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right) = \int q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_0,\mathbf{x}_t\right)q\left(\mathbf{x}_0 \mid \mathbf{x}_t\right) d\mathbf{x}_0\]

对$q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right)$的基本要求也是便于采样，落脚到$q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_0,\mathbf{x}_t\right)$和$q\left(\mathbf{x}_0 \mid \mathbf{x}_t\right)$便于采样。这样一来，就可以通过下述流程完成$q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_t\right)$的采样：

\[\hat{\mathbf{x}}_0 \sim q\left(\mathbf{x}_0 \mid \mathbf{x}_t\right), \quad \mathbf{x}_{t-1} \sim q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_0=\hat{\mathbf{x}}_0,\mathbf{x}_t\right)\]

根据该分解过程，扩散模型每一步的采样$\mathbf{x}_t \to \mathbf{x}_{t-1}$实际上包含了两个子步骤：

预估：由$q\left(\mathbf{x}_0 \mid \mathbf{x}_t\right)$对$\mathbf{x}_0$做一个简单的估计；
修正：由$q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_0,\mathbf{x}_t\right)$根据估计结果，将估值进行一定程度的修正。

扩散模型的反向过程就是一个反复的“预估-修正”过程，将原本难以一步到位的生成$\mathbf{x}_t \to \mathbf{x}_0$分解成逐步推进的过程，并且每一步都进行了数值修正。

⚪ 训练目标

为实现反向的生成过程，需要设计$q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_0,\mathbf{x}_t\right)$和$q\left(\mathbf{x}_0 \mid \mathbf{x}_t\right)$。其中$q\left(\mathbf{x}_0 \mid \mathbf{x}_t\right)$是用$\mathbf{x}_t$来预测$\mathbf{x}_0$的概率模型，需要方便采样且容易计算。本文选择为$l_1$范数为度量的正态分布（拉普拉斯分布），即：

\[q\left(\mathbf{x}_0 \mid \mathbf{x}_t\right) = \frac{1}{Z(\tau)} \int e^{-\frac{\left\|\mathbf{x}_0 - \mathcal{G}_t(\mathbf{x}_t) \right\|_1}{\tau}}d\mathbf{x}_0\]

根据近似分布$q\left(\mathbf{x}_0 \mid \mathbf{x}_t\right)$是用$\mathbf{x}_t$，训练目标通常选择为交叉熵（负对数似然）：

\[\begin{aligned} &\mathbb{E}_{\mathbf{x}_0 \sim q(\mathbf{x}_0),\mathbf{x}_t \sim q(\mathbf{x}_t\mid \mathbf{x}_0)}[- \log q\left(\mathbf{x}_0 \mid \mathbf{x}_t\right)] \\ = & \mathbb{E}_{\mathbf{x}_0 \sim q(\mathbf{x}_0),\mathbf{x}_t \sim q(\mathbf{x}_t\mid \mathbf{x}_0)}[- \log \frac{1}{Z(\tau)} \int e^{-\frac{\left\|\mathbf{x}_0 - \mathcal{G}_t(\mathbf{x}_t) \right\|_1}{\tau}}d\mathbf{x}_0] \\ \propto & \mathbb{E}_{\mathbf{x}_0 \sim q(\mathbf{x}_0),\mathbf{x}_t \sim q(\mathbf{x}_t\mid \mathbf{x}_0)}[\left\|\mathbf{x}_0 - \mathcal{G}_t(\mathbf{x}_t) \right\|_1] \\ = & \mathbb{E}_{\mathbf{x}_0 \sim q(\mathbf{x}_0),\mathbf{x}_t \sim q(\mathbf{x}_t\mid \mathbf{x}_0)}[\left\|\mathbf{x}_0 - \mathcal{G}_t(\mathcal{F}_t (\mathbf{x}_{0})) \right\|_1] \end{aligned}\]

⚪ 条件概率

最后需要设计$q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_0,\mathbf{x}_t\right)$，即给定$\mathbf{x}_0,\mathbf{x}_t$来预测$\mathbf{x}_{t-1}$的概率。该概率需要满足边缘分布的恒等式：

\[\int q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_0,\mathbf{x}_t\right)q\left(\mathbf{x}_t \mid \mathbf{x}_0\right) d\mathbf{x}_t = q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_0\right)\]

满足上式的最简单选择为：

\[q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_0,\mathbf{x}_t\right) = q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_0\right)\]

对应反向过程：

\[\mathbf{x}_{t-1} = \mathcal{F}_{t-1} (\mathbf{x}_{0}) + \sigma \mathbf{\epsilon}\]

由$\mathbf{x}_t = \mathcal{F}_t (\mathbf{x}_{0}) + \sigma \mathbf{\epsilon}$解得$\mathbf{\epsilon} = (\mathbf{x}_t - \mathcal{F}_t (\mathbf{x}_{0}))/\sigma$，代回得：

\[\begin{aligned} \mathbf{x}_{t-1} &= \mathbf{x}_t+\mathcal{F}_{t-1} (\mathbf{x}_{0}) - \mathcal{F}_t (\mathbf{x}_{0})\\ &= \mathbf{x}_t+\mathcal{F}_{t-1} (\mathcal{G}_t(\mathbf{x}_t)) - \mathcal{F}_t (\mathcal{G}_t(\mathbf{x}_t)) \end{aligned}\]

上式对应原论文的Improved Sampling。若直接令$\sigma=0$，则对应Naive Sampling：

\[\mathbf{x}_{t-1} = \mathcal{F}_{t-1} (\mathbf{x}_{0}) =\mathcal{F}_{t-1} (\mathcal{G}_t(\mathbf{x}_t))\]