McGAN:均值和协方差特征匹配GAN.
本文作者通过积分概率度量(integral probability metrics, IPM)构造了真实数据分布$P_{data}(x)$和生成数据分布$P_G(x)$之间的距离度量,并进一步设计了均值和协方差特征匹配生成对抗网络(Mean and Covariance Feature Matching GAN, McGAN)。
积分概率度量寻找满足某种限制条件的函数集合\(\mathcal{F}\)中的连续函数$f(\cdot)$,使得该函数能够提供足够多的关于矩的信息;然后寻找一个最优的\(f(x)\in \mathcal{F}\)使得两个概率分布$p(x)$和$q(x)$之间的差异最大,该最大差异即为两个分布之间的距离:
\[d_{\mathcal{F}}(p(x),q(x)) = \mathop{\sup}_{f(x)\in \mathcal{F}} \Bbb{E}_{x \text{~} p(x)}[f(x)]-\Bbb{E}_{x \text{~} q(x)}[f(x)]\]1. Mean Feature Matching GAN
定义函数空间\(\mathcal{F}\)为如下形式:
\[\begin{aligned} \mathcal{F}_{v,w,p} = \{ & f(x) = <v,\Phi_w(x)>| \\ &v \in \Bbb{R}^m,||v||_p \leq 1,\\ &\Phi_w(x):\mathcal{X}\to \Bbb{R}^m,w \in \Omega \} \end{aligned}\]其中$v$是$p$范数不超过$1$的$m$维向量,$\Phi_w(\cdot)$是通过$w$参数化的神经网络。则对应的IPM距离为:
\[\begin{aligned} d_{\mathcal{F}}(p(x),q(x)) &= \mathop{\sup}_{f \in \mathcal{F}_{v,w,p}} \Bbb{E}_{x \text{~} p(x)}[f(x)]-\Bbb{E}_{x \text{~} q(x)}[f(x)] \\ &= \mathop{\max}_{w \in \Omega,v,||v||_p \leq 1} <v,\Bbb{E}_{x \text{~} p(x)}[\Phi_w(x)]-\Bbb{E}_{x \text{~} q(x)}[\Phi_w(x)]> \\ &= \mathop{\max}_{w \in \Omega} [\mathop{\max}_{v,||v||_p \leq 1} <v,\Bbb{E}_{x \text{~} p(x)}[\Phi_w(x)]-\Bbb{E}_{x \text{~} q(x)}[\Phi_w(x)]>] \\ &= \mathop{\max}_{w \in \Omega} ||\Bbb{E}_{x \text{~} p(x)}[\Phi_w(x)]-\Bbb{E}_{x \text{~} q(x)}[\Phi_w(x)]||_p \end{aligned}\]上述IPM距离旨在寻找一个最优映射$\Phi_w(\cdot)$使得两个分布映射到$\Phi_w(\cdot)$的特征空间后,其均值的差异最大,对应的最大均值差异即为两个分布之间的距离。
使用判别器$D(x)$作为特征映射函数$\Phi_w(\cdot)$,旨在学习真实数据分布$P_{data}(x)$和生成数据分布$P_G(x)$之间的距离;而生成器$G$的目标是最小化该距离;对应的GAN目标函数为:
\[\begin{aligned} \mathop{ \min}_{G} \mathop{ \max}_{D} ||\Bbb{E}_{x \text{~} P_{data}(x)}[D(x)]-\Bbb{E}_{x \text{~} P_{G}(x)}[D(x)] ||_p \end{aligned}\]2. Covariance Feature Matching GAN
定义函数空间\(\mathcal{F}\)为如下形式:
\[\begin{aligned} \mathcal{F}_{U,V,w} = \{ &f(x) = <U^T\Phi_w(x),V^T\Phi_w(x)>| \\ &U,V \in \Bbb{R}^{m\times k},U^TU=I_k,V^TV=I_k, \\ &\Phi_w(x):\mathcal{X}\to \Bbb{R}^m,w \in \Omega \} \end{aligned}\]其中$\Phi_w(\cdot)$是通过$w$参数化的神经网络。则对应的IPM距离为:
\[\begin{aligned} d_{\mathcal{F}}(p(x),q(x)) &= \mathop{\sup}_{f \in \mathcal{F}_{U,V,w}} \Bbb{E}_{x \text{~} p(x)}[f(x)]-\Bbb{E}_{x \text{~} q(x)}[f(x)] \\ &= \mathop{\max}_{w \in \Omega,U^TU=I_k,V^TV=I_k} U^T<\Bbb{E}_{x \text{~} p(x)}[\Phi_w(x)]-\Bbb{E}_{x \text{~} q(x)}[\Phi_w(x)]),\Bbb{E}_{x \text{~} p(x)}[\Phi_w(x)]-\Bbb{E}_{x \text{~} q(x)}[\Phi_w(x)]>V \\ &= \mathop{\max}_{w \in \Omega,U^TU=I_k,V^TV=I_k} \text{Tr}[U^T(\Bbb{E}_{x \text{~} p(x)}[\Phi_w(x)\Phi^T_w(x)]-\Bbb{E}_{x \text{~} q(x)}[\Phi_w(x)\Phi^T_w(x)])V] \\ &= \mathop{\max}_{w \in \Omega} ||\Bbb{E}_{x \text{~} p(x)}[\Phi_w(x)\Phi^T_w(x)]-\Bbb{E}_{x \text{~} q(x)}[\Phi_w(x)\Phi^T_w(x)]||_{*} \end{aligned}\]上述IPM距离旨在寻找一个最优映射$\Phi_w(\cdot)$使得两个分布映射到$\Phi_w(\cdot)$的特征空间后,其协方差的差异最大。协方差的差异通过核范数(奇异值的和)衡量。
使用判别器$D(x)$作为特征映射函数$\Phi_w(\cdot)$,旨在学习真实数据分布$P_{data}(x)$和生成数据分布$P_G(x)$之间的距离;而生成器$G$的目标是最小化该距离;对应的GAN目标函数为:
\[\begin{aligned} \mathop{ \min}_{G} \mathop{ \max}_{D} ||\Bbb{E}_{x \text{~} P_{data}(x)}[D(x)D^T(x)]-\Bbb{E}_{x \text{~} P_{G}(x)}[D(x)D^T(x)] ||_* \end{aligned}\]3. Mean and Covariance Matching GAN
综合考虑约束两个分布的特征均值和协方差,则可构造McGAN的目标函数:
\[\begin{aligned} \mathop{ \min}_{G} \mathop{ \max}_{D} &||\Bbb{E}_{x \text{~} P_{data}(x)}[D(x)]-\Bbb{E}_{x \text{~} P_{G}(x)}[D(x)] ||_p \\ &+ ||\Bbb{E}_{x \text{~} P_{data}(x)}[D(x)D^T(x)]-\Bbb{E}_{x \text{~} P_{G}(x)}[D(x)D^T(x)] ||_* \end{aligned}\]