PointNet:用于3D点集分类与分割的深度学习模型.

点云(point cloud)是一种无规律性(irregular)的几何数据结构。本文设计了一种可以直接处理点云的神经网络PointNetPointNet的输入是点云,输出是整个输入的类别标签或者每个输入点的分割或者分块的标签。考虑到输入点云序列不变性的特征,使用一个简单的对称函数(max-pooling)编码无序的点云全局特征。

点云是一组3D点的集合,每一个点最基本应包含坐标$(x,y,z)$,还可以包含颜色,法向量等特征。它具有如下三大主要特性:

PointNet在处理上述三个特点时采用以下设计:

PointNet的网络结构如下:

⚪ 分类网络

分类网络的输入是$n$个三维坐标,通过T-Net预测一个变换矩阵对输入进行变换(矩阵乘法),然后使用MLP对每个点做特征嵌入,之后在特征空间中预测一个变换矩阵对特征进行变换,然后使用MLP再做一次特征增强,最后通过最大池化得到全局特征,并用MLP做分类预测。

⚪ 分割网络

相比分类,分割需要每个点捕捉全局信息后才能知道自己是哪一类,于是把每个点的特征和全局特征进行连接,然后使用MLP做特征增强和分类预测。

⚪ PointNet的Pytorch实现

用于构造变换矩阵的T-Net实现如下:

class STN(nn.Module):
    def __init__(self, k):
        super(STN, self).__init__()
        self.conv1 = torch.nn.Conv1d(k, 64, 1)
        self.conv2 = torch.nn.Conv1d(64, 128, 1)
        self.conv3 = torch.nn.Conv1d(128, 1024, 1)
        self.fc1 = nn.Linear(1024, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, k * k)
        self.relu = nn.ReLU()

        self.bn1 = nn.BatchNorm1d(64)
        self.bn2 = nn.BatchNorm1d(128)
        self.bn3 = nn.BatchNorm1d(1024)
        self.bn4 = nn.BatchNorm1d(512)
        self.bn5 = nn.BatchNorm1d(256)

        self.k=k
        
    def forward(self, x):
        """
        :param x: size[batch,channel(feature_dim),L(length of signal sequence)]
        """
        batchsize = x.size()[0]
        x = self.relu(self.bn1(self.conv1(x)))
        x = self.relu(self.bn2(self.conv2(x)))
        x = self.relu(self.bn3(self.conv3(x)))
        x = torch.max(x, 2, keepdim=True)[0]
        x = x.view(-1, 1024)

        x = self.relu4(self.bn4(self.fc1(x)))
        x = self.relu5(self.bn5(self.fc2(x)))
        x = self.fc3(x)

        iden = torch.tensor(torch.from_numpy(np.eye(self.k).flatten().astype(np.float32))).view(1, self.k**2).repeat(
            batchsize, 1)
        if x.is_cuda:
            iden = iden.cuda()
        x = x + iden
        x = x.view(-1, self.k, self.k)
        return x

如果在网络中引入了T-Net,则希望T-Net生成的变换矩阵尽量接近旋转矩阵(即正交矩阵),会使优化过程更稳定。因此额外引入正交性损失:

\[L_{reg} = ||I-AA^T||_F^2\]
def feature_transform_reguliarzer(trans):
    """
    :param trans: Rotation Matrix
    """
    d = trans.size()[1]
    I = torch.eye(d)[None, :, :]
    if trans.is_cuda:
        I = I.cuda()
    #网络得到的旋转矩阵应该尽量正交,并以正交性作为损失
    loss = torch.mean(torch.norm(torch.bmm(trans, trans.transpose(2, 1)) - I, dim=(1, 2)))
    return loss

PointNet中用于提取点云特征的部分实现如下:

class PointNetEncoder(nn.Module):
    def __init__(self, channel, transform=False):
        """
        :param channel: input channel
        :param transform: if use STN
        """
        super(PointNetEncoder, self).__init__()
        if transform:
            self.STN = STN(k=channel)
            self.STN_feature = STN(k=64)
        self.conv1 = nn.Conv1d(channel,64,1)
        self.conv2 = nn.Conv1d(64,64,1)
        self.conv3 = nn.Conv1d(64,64,1)
        self.conv4 = nn.Conv1d(64,128,1)
        self.conv5 = nn.Conv1d(128,1024,1)

        self.bn1 = nn.BatchNorm1d(64)
        self.bn2 = nn.BatchNorm1d(64)
        self.bn3 = nn.BatchNorm1d(64)
        self.bn4 = nn.BatchNorm1d(128)
        self.bn5 = nn.BatchNorm1d(1024)

        self.relu = nn.ReLU()
        self.transform=transform

    def forward(self, x, global_feature=True):
        # global_feature: True:return global_feature False:return cat(points_feature,global_feature)
        N = x.shape[2]
        if self.transform:
            tran1 = self.STN(x)
            x = x.transpose(2, 1)
            x = torch.bmm(x, tran1)
            x = x.transpose(2, 1)
        x = self.relu(self.bn1(self.conv1(x)))
        x = self.relu(self.bn2(self.conv2(x)))
        if self.transform:
            tran2 = self.STN_feature(x)
            x = x.transpose(2, 1)
            x = torch.bmm(x, tran2)
            x = x.transpose(2, 1)
        if self.transform:
            tran_mat = (tran1,tran2)
        else:
            tran_mat = None
        point_wise_feature=x
        x = self.relu(self.bn3(self.conv3(x)))
        x = self.relu(self.bn4(self.conv4(x)))
        x = self.relu(self.bn5(self.conv5(x)))

        global_feature_vec = torch.max(x, 2, keepdim=True)[0]
        global_feature_vec = global_feature_vec.view(-1,1024)
        if global_feature:
            return global_feature_vec, tran_mat
        else:
            temp = global_feature_vec.unsqueeze(2).repeat(1,1,N)
            return torch.cat([point_wise_feature,temp],dim=1), tran_mat

PointNet的分类网络实现如下:

class PointNet_cls(nn.Module):
    def __init__(self, cls=40, normal_channel=False, transform=False):
        super(PointNet_cls, self).__init__()
        if normal_channel:
            channel = 6
        else:
            channel = 3
        self.feat = PointNetEncoder(channel=channel,transform=transform)
        self.fc1 = nn.Linear(1024, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, cls)
        self.dropout = nn.Dropout(p=0.3)
        self.bn1 = nn.BatchNorm1d(512)
        self.bn2 = nn.BatchNorm1d(256)
        self.relu = nn.ReLU()
        self.log_softmax=nn.LogSoftmax(dim=1)
        
    def forward(self, x):
        x, trans_mat = self.feat(x)
        x = self.relu(self.bn1(self.fc1(x)))
        x = self.relu(self.bn2(self.dropout(self.fc2(x))))
        x = self.fc3(x)
        x = self.log_softmax(x)
        return x, trans_mat