Change of Variable Theorem.
1. 一维随机变量的变量替换定理
若随机变量$X \in \Bbb{R}$的概率密度函数为$p_X(x)$,对于变量替换$Y=f(X) \in \Bbb{R}$,其中$f(\cdot)$是严格单调函数,且导数$f’(\cdot)$存在,则随机变量$Y$的概率密度函数为:
\[p_Y(y) = p_X(f^{-1}(y))\cdot |\nabla_y f^{-1}(y)|\]其中$x=f^{-1}(y)$是$y=f(x)$的反函数。
⚪ 定理的证明
由于$y=f(x)$是严格单调函数且可导,则$x=f^{-1}(y)$存在且可导。若记$F_X(x)=P(X\leq x)$为随机变量$X$的分布函数,当$f(X)$是单调递增函数时,随机变量$Y$的分布函数和概率密度函数为:
\[\begin{aligned} F_Y(y)&=P(Y\leq y)=P(f(X)\leq y) \\ &= P(X \leq f^{-1}(y)) = F_X(f^{-1}(y)) \\ p_Y(y) &= \nabla_y F_X(f^{-1}(y)) = p_X(f^{-1}(y))\cdot \nabla_y f^{-1}(y) \end{aligned}\]当$f(X)$是单调递减函数时,随机变量$Y$的分布函数和概率密度函数为:
\[\begin{aligned} F_Y(y)&=P(Y\leq y)=P(f(X)\leq y) \\ &= P(X \geq f^{-1}(y)) = 1- F_X(f^{-1}(y)) \\ p_Y(y) &= - p_X(f^{-1}(y))\cdot \nabla_y f^{-1}(y) \end{aligned}\]注意到$f(\cdot)$是单调递增函数时$\nabla_y f^{-1}(y) \gt 0$,$f(\cdot)$是单调递减函数时$\nabla_y f^{-1}(y) \lt 0$,综上可得原结论。
⚪ 讨论:该定理的几何解释
随机变量$X$和$Y$的概率密度函数$p_X(x)$和$p_Y(y)$应满足归一化:
\[\int p_X(x)dx = \int p_Y(y) dy\]其中自变量的变化$dx$,$dy$可以看作随机变量分布空间中的一小块区域。由于随机变量$X$和$Y$具有严格单调的对应关系$Y=f(X)$,因此从区域$dx$到区域$dy$的映射是一一对应的,两个区域的概率应保持不变(如果改变将不满足归一化):
\[|p_X(x)dx| = |p_Y(y)dy|\]注意到$x=f^{-1}(y)$,上式也可以写作:
\[p_Y(y) = p_X(x) |\frac{dx}{dy}| = p_X(f^{-1}(y)) |\frac{df^{-1}(y)}{dy}|\]2. 多维随机向量的变量替换定理
若$X = (X_1,X_2,\cdots X_n) \in \Bbb{R}^n$和$Y = (Y_1,Y_2,\cdots Y_n) \in \Bbb{R}^n$为多维随机向量,对于变量替换$Y=f(X)$,有如下关系:
\[p_Y(y) = p_X(f^{-1}(y))\cdot |\det J_{f^{-1}}(y)|\]其中$x=f^{-1}(y)$是$y=f(x)$的反函数,$J_{f^{-1}}(y)$是函数$f^{-1}(y)$关于变量$y$的Jacobian矩阵,$\det$是行列式运算。
⚪ 引理:Jacobian矩阵和Jacobian行列式
变量替换$Y=f(X)$是多元函数,展开有:
\[Y_1= f_1(X_1,X_2,\cdots X_n) \\ Y_2= f_2(X_1,X_2,\cdots X_n) \\ \cdots \\Y_n= f_n(X_1,X_2,\cdots X_n)\]$f(\cdot)$的Jacobian矩阵定义为:
\[J_f(x) = \begin{bmatrix} \frac{\partial f}{\partial x_1} & \frac{\partial f}{\partial x_2} & \cdots & \frac{\partial f}{\partial x_n} \end{bmatrix} = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_n}{\partial x_1} & \frac{\partial f_n}{\partial x_2} & \cdots & \frac{\partial f_n}{\partial x_n} \end{bmatrix}\]Jacobian矩阵提供了从$x$坐标系到$y$坐标系的局部线性变换(微分对应局部,矩阵代表线性变换)。
考虑$x$坐标系中一个顶点位于坐标原点$O$的体积元$(dx_1,dx_2,\cdots dx_n)$,其临边顶点分别为$(dx_1,0,\cdots 0),(0,dx_2,\cdots 0),\cdots,(0,0,\cdots dx_n)$,该体积元的体积为$dx_1dx_2\cdots dx_n$。
若体积元$(dx_1,dx_2,\cdots dx_n)$经过变换$f(\cdot)$映射到$y$坐标系中的体积元$(dy_1,dy_2,\cdots dy_n)$,则这些临边顶点映射为:
\[\begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_n}{\partial x_1} & \frac{\partial f_n}{\partial x_2} & \cdots & \frac{\partial f_n}{\partial x_n} \end{bmatrix} \begin{bmatrix} dx_1 \\0 \\ \vdots \\0 \end{bmatrix} = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} \\\frac{\partial f_2}{\partial x_1} \\ \vdots \\\frac{\partial f_n}{\partial x_1} \end{bmatrix}dx_1 \\ \cdots \\ \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \frac{\partial f_1}{\partial x_2} & \cdots & \frac{\partial f_1}{\partial x_n} \\ \frac{\partial f_2}{\partial x_1} & \frac{\partial f_2}{\partial x_2} & \cdots & \frac{\partial f_2}{\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial f_n}{\partial x_1} & \frac{\partial f_n}{\partial x_2} & \cdots & \frac{\partial f_n}{\partial x_n} \end{bmatrix} \begin{bmatrix} 0\\0 \\ \vdots \\dx_n \end{bmatrix} = \begin{bmatrix} \frac{\partial f_1}{\partial x_n} \\\frac{\partial f_2}{\partial x_n} \\ \vdots \\\frac{\partial f_n}{\partial x_n} \end{bmatrix}dx_n\]体积元$(dy_1,dy_2,\cdots dy_n)$的体积为这些临边矢量的混合积:
\[\begin{aligned} dy_1dy_2\cdots dy_n &= \det \begin{bmatrix} \begin{bmatrix} \frac{\partial f_1}{\partial x_1} \\\frac{\partial f_2}{\partial x_1} \\ \vdots \\\frac{\partial f_n}{\partial x_1} \end{bmatrix}dx_1 & \begin{bmatrix} \frac{\partial f_1}{\partial x_2} \\\frac{\partial f_2}{\partial x_2} \\ \vdots \\\frac{\partial f_n}{\partial x_2} \end{bmatrix}dx_2 & \cdots &\begin{bmatrix} \frac{\partial f_1}{\partial x_n} \\\frac{\partial f_2}{\partial x_n} \\ \vdots \\\frac{\partial f_n}{\partial x_n} \end{bmatrix} dx_n \end{bmatrix} \\ &= \det \begin{bmatrix} \begin{bmatrix} \frac{\partial f_1}{\partial x_1} \\\frac{\partial f_2}{\partial x_1} \\ \vdots \\\frac{\partial f_n}{\partial x_1} \end{bmatrix} & \begin{bmatrix} \frac{\partial f_1}{\partial x_2} \\\frac{\partial f_2}{\partial x_2} \\ \vdots \\\frac{\partial f_n}{\partial x_2} \end{bmatrix} & \cdots &\begin{bmatrix} \frac{\partial f_1}{\partial x_n} \\\frac{\partial f_2}{\partial x_n} \\ \vdots \\\frac{\partial f_n}{\partial x_n} \end{bmatrix} \end{bmatrix} dx_1 dx_2 \cdots dx_n \\ &= \det J_f(x) \cdot dx_1 dx_2 \cdots dx_n \end{aligned}\]其中$\det J_f(x)$为Jacobian行列式。Jacobian行列式代表经过Jacobian矩阵变换后得到的$y$坐标系中体积元与原$x$坐标系中体积元之间的相对体积变化率。
⚪ 定理的证明
若记$F_X(x)=P(X_1\leq x_1,X_2\leq x_2,\cdots X_n\leq x_n)$为随机向量$X$的联合分布函数,$F_Y(y)=P(Y_1\leq y_1,Y_2\leq y_2,\cdots Y_n\leq y_n)$为随机向量$Y$的联合分布函数,则有定义:
\[\begin{aligned} F_Y(y)&=P(Y_1\leq y_1,Y_2\leq y_2,\cdots Y_n\leq y_n)\\ &= \mathop{\int \cdots \int}_{D} p_X(x) dx_1dx_2\cdots dx_n \\ D&:=\{(x_1,x_2,\cdots x_n)|f_1(x)\leq y_1,f_2(x)\leq y_2,\cdots f_n(x)\leq y_n \} \end{aligned}\]根据上一节讨论的变换关系(原式中体积可正可负,在此引入绝对值):
\[dy_1dy_2\cdots dy_n = |\det J_f(x)| \cdot dx_1 dx_2 \cdots dx_n\]则有:
\[\begin{aligned} F_Y(y)&= \mathop{\int \cdots \int}_{D} p_X(x) dx_1dx_2\cdots dx_n \\ &= \mathop{\int \cdots \int}_{D'} p_X(x)\frac{1}{|\det J_f(x)|} dy_1dy_2\cdots dy_n \\ D'&:= \{Y_1\leq y_1,Y_2\leq y_2,\cdots Y_n\leq y_n\} \end{aligned}\]若$F_Y(y)$为联合分布函数,则随机向量$Y$的概率密度函数为:
\[p_Y(y) = p_X(x)\frac{1}{|\det J_f(x)|}\]若$y=f(x)$具有逆变换$x=f^{-1}(y)$,其二者对应的Jacobian矩阵也互为逆矩阵$J_{f}(y)$和$J_{f^{-1}}(x)$。由行列式的性质可得互为逆矩阵的Jacobian行列式互为倒数:
\[\frac{1}{\det J_{f}(x)} =\det J_{f^{-1}}(y)\]因此得证:
\[p_Y(y) = p_X(f^{-1}(y)) \cdot |\det J_{f^{-1}}(y)|\]⚪ 讨论:该定理的几何解释
变量替换定理又写做如下形式:
\[p_Y(y) = \frac{p_X(f^{-1}(y))}{|\det J_{f}(x)|}\]变换$y=f(x)$通过改变概率分布的空间使得$p_Y(y)$和$p_X(x)$建立联系。Jacobian矩阵$J_{f}(x)=dy/dx$给出了$y$附近的一小块区域$dy$与$x$附近的一小块区域$dx$之间经过$f$变换后的对应关系,而Jacobian行列式$\det J_{f}(x)$量化了区域$dy$之于区域$dx$的相对变化大小。由于区域$dy$包含的概率应等于区域$dx$包含的概率,因此区域$dy$的变化率应与概率密度的变化成反比。
注:矩阵$A$的行列式的几何意义为由矩阵$A$定义的局部线性变换导致的相对面积变化率。