[读论文] Monocular 3D Object Reconstruction with GAN inversion (ECCV2022)

[读论文] Monocular 3D Object Reconstruction with GAN inversion (ECCV2022)
概述
- 项目主页：https://www.mmlab-ntu.com/project/meshinversion/
- 方法名称：MeshInversion
- 输入：单目图像（in the wild，有背景的，没有抠图的）
- 输出：textured 3D mesh
- key challenge: 缺少3D或multiview supervision
- 方法核心：先预训练一个3D GAN （ConvMesh，其中mesh表达为deformation and texture maps），可以从latent code z生成textured mesh。然后在inference的时候，从输入的图片倒推最符合的z。（这是一个inferece optimization的方法！！）（将生成的mesh用预测的相机参数渲染出来，用输入图片的texture CD loss和mask CD loss来监督）
- 主要用到或参考的网络：ConvMesh，PatchGAN，mask 用现成的segmentation tool (PointRend)来获取。
Related Work

Single View 3D Reconstruction

image-3D object pairs [46,35,32,39]
multi-view images [33,28,51,47,34]
SMPL for humans and 3DMM for faces [8,40,18],

CMR [19] reconstructs category-specific
textured mesh

texture一般有两种方法，一个是direct regression of pixel values in the UV texture map – often blurry 但作者用的这个。
主流方法是learning the texture flow，对novel view的泛化能力不好。

GAN inversion

GAN inversion 是指先训练好一个GAN，然后找到合适的z，使得z输入GAN以后得到的输出尽可能满足要求。

通常可以用
梯度下降（略）

用一个encoder来学：
Bau, D., Strobelt, H., Peebles, W., Zhou, B., Zhu, J.Y., Torralba, A., et al.: Semantic photo manipulation with a generative image prior. In: SIGGRAPH (2019)

或者二者的结合：
Zhu, J., Shen, Y., Zhao, D., Zhou, B.: In-domain GAN inversion for real image
editing. In: ECCV (2020)

3D领域最新的工作，包括用GAN Inversion进行点云补全：
Zhang, J., Chen, X., Cai, Z., Pan, L., Zhao, H., Yi, S., Yeo, C.K., Dai, B., Loy, C.C.:
Unsupervised 3D shape completion through GAN inversion. In: CVPR (2021)

textured mesh generation

6.Learning to predict 3D objects with an interpolation-based differentiable renderer.
In: NeurIPS (2019)
重建的mesh可微渲染之后，用渲染得到的multi view images做discriminaive 监督

13.Leveraging 2D data to learn textured
3D mesh generation. In: CVPR (2020)
VAE 方法，face colors instead of texture maps

38.Convolutional generation of textured 3D meshes
topology-aligned texture maps and deformation maps in the UV space. （本文就用了他的pretrained model）

Method

看起来大体方法是用Generator从latent code生成geometry和texture，然后用chamfer mask loss和chamfer texture loss来监督。

Preliminaries

mesh表示为O = (V,F,T), 即点，面，texture map。
其中，由于

An individual mesh is iso-morphic to a 2-pole sphere.

因此点的位置可用球体的deformation $\Delta \mathbf{V}$ 表示：
$\mathbf{V} = \mathbf{V}_{sphere} + \Delta \mathbf{V}$
以前的方法大多用MLP来regress delta V，本文使用CNN。

渲染时，使用弱透视投影。（区别于透视投影和正交投影的一种投影方法），参数为π, 包含scale s， translation t和rotation r。

3.1 Reconstruction with Generative Prior

Pre-training Stage
- 这个阶段训练了一个3D GAN。
- Generator主要参考ConvMesh
  - 发生在uv space
  - 输出的是deformation map和texture map。
- Discriminator主要参考PatchGAN。
- Loss 包括
  - generator loss
  - Discrininator loss on UV space
  - DIscrininator loss on image space (参考PatchGAN)
Inversion Stage
- 目的：find the z that best recovers the 3D object from the input image $\mathbf{I}_{in}$ .
- 需要：原始的image，其对应的mask，还有将3Dshape进行渲染的相机参数。
  - 其中mask 用现成的segmentation tool (PointRend)来获取。
    理由在此：https://github.com/junzhezhang/mesh-inversion/issues/5 是为了fair comparison以及强调这是test time optimization
  - 用ConvMesh 预测Mesh (shape)的latent code z，用CMR预测相机参数π。
    如何预测相机参数π：如果直接regress camera pose from scratch，存在camera-shape ambiguity问题。[24] 所以我们用CMR来initialize the camera。
  - 用预测的相机参数，将预测的mesh渲染为2D图片求loss（见下文）
由于这个相机位置是不断oprimize的，image不可能完美对齐，需要一个鲁棒的texture loss，见下文

 3.2 chamfer texture loss （重点参考）
- 将image看做2D点云，每个点有2D坐标和3D的RGB颜色值。
- 两个图像的dissimilarity就用chamfer distance来表达。
  - 其中distance D 被分解为 appearance term and spatial term, 都用的l2 distance。
  - 重要：具体来说，考虑到我们只想让他tolerant on local misalignment, 因此在spatial term上增加了一个exp操作来惩罚空间距离过远的点，变成这样:
  - 解释：首先是Da和Ds相乘。
    增加epsilon是如果有一样位置的点（Ds为零），颜色相差极大（Da很大），那应该算作不同的点，免得给他弄成零了；
    然后Ds这边加上指数，惩罚距离太远的，因为我只想要较小的misalignment
    取个max
  - 注意：Ds这一项是不可微的，他只是训练Da（texture）用的权重。
这个东西挺有用的，请看消融实验：

除了pixel level的CD loss，还有feature level的CD loss：
Specifically, we apply the Chamfer texture loss between the (foreground) feature maps extracted with a pre-trained VGG-19 network [42] from the rendered image and the input image.
这一点有点像contextual loss （The contextual loss for image transformation with non-aligned data.），但有点区别。

feature level Chamfer texture loss: 考虑location，但不要求完全对齐；
contextual loss：完全不考虑location。

loss的消融实验

CT是指chamfer texture loss；
LpCT是pixel level的； LfCT是feature level的。

看中间那三行，可以看到，
contextual是最差的，
其次是只有L1；
L1 + perceptual好一点；
最好的还是CT loss

3.3 Chamfer Mask Loss
- 传统的mask loss，通常是把3Dshape量化到一个个grid of pixels（mask），然后和gt mask 求l1或IoU loss
  - 从3D shape 得到mask需要rasterization that discretizes the mesh into a grid of pixels. 这一部会导致信息丢失，引入误差，对训练好的ConvMesh影响尤其大。
- 为此，作者提出Chamfer Mask Loss Lcm. （不求L1，而求CD，不再有量化误差）
  - 不是将mesh渲染为binary mask，而是把mesh的点直接投影到image plane，得到Sv。
  - 然后把用现成工具分割得到的前景点的坐标给normalize到-1到1之间，得到Sf。
  - 然后计算Sv和Sf的chamfer distance
总loss
- pixel-level chamfer texture loss (appearance)
- feature-level chamfer texture loss (appearance)
- chamfer mask loss (geometry)
- smooth loss (neighboring faces to have similar normals i.e. low cosine)
- latent space loss (L2 norm of z to ensure Gaussian distribution)
等下仔细看看代码，尤其是这个latent space loss。
以及那个feature level是咋搞啊。

Experiments
- datasets：
  - CUB-200-2011 （鸟类）
  - PASCAL3D: cars
- pretrain ConvMesh: pseudo ground truths ??? 感觉是指上文提到的那个segmentation和camera pose prediction网络得到的结果。
- inference 时GAN inversion：似乎也是pseudo ground truths。
- evaluation：用的GT了
  - geometry accuracy: rendered masks 和 GT masks的2D mask IoU
  - appearance quality: image synthesis metric FID （single view and multi view）, 反映了GT images和generated images的分布的相似性。
  - user study: 找了40个user来打分。
  - (PASCAL3D 特有：有approximated 3D CAD shapes，可以用3D IoU）
Texture Flow vs. Texture Regression

Texture Flow 更常用，但在invisible的地方容易出错；因为容易copy foreground pixies including the obstacles.

实现（主要来自补充材料）

时间，显存，设备GPU

Pre-training：
600 epochs, with a batch size of 128,
15 hours on four Nvidia V100 GPUs.

网络结构：和ConvMesh一样。
- convolutional generator G with 2 branches.
  - 输入：latent code z （64）
  - 输出：deformation map S 32*32; texture map T 512-512
- UV space discriminator
  - deformation map
  - texture map
- image space discriminator (PatchGAN)
chamfer texture loss实现笔记
- 解释：首先是Da和Ds相乘。
  - 增加epsilon是如果有一样位置的点（Ds为零），颜色相差极大（Da很大），那应该算作不同的点，免得给他弄成零了；
  - 然后Ds这边加上指数，惩罚距离太远的，因为我只想要较小的misalignment
  - 取个max
- 注意：Ds这一项是不可微的，他只是训练Da（texture）用的权重。
texture CD loss 代码

mesh_inversion.py
https://github.com/junzhezhang/mesh-inversion/blob/d6614726344f5a56c068df2750fefc593c4ca43d/lib/mesh_inversion.py#L265
```
if self.args.chamfer_texture_pixel_loss:
    # NOTE: batch size should be one
    pix_pos_pred = mask2proj(mask_pred)
    pix_pred = grid_sample_from_vtx(pix_pos_pred, image_pred)
    dist_map_c, idx_a, idx_b = distChamfer_downsample(pix_pred,color_target,resolution=self.args.chamfer_resolution)
    dist_map_p, _, _ = distChamfer_downsample(pix_pos_pred,vtx_target,resolution=self.args.chamfer_resolution, idx_a=idx_a, idx_b=idx_b)

    xy_threshold = self.args.xy_threshold
    k = self.args.xy_k
    alpha = self.args.xy_alpha
    eps = 1 - (2*k*xy_threshold)**2
    rgb_eps = self.args.rgb_eps
    if eps == 1:
        xy_term = torch.pow(1+k*dist_map_p, alpha)
    else:
        xy_term = F.relu(torch.pow(eps+k*dist_map_p, alpha)-1) + 1
    dist_map = xy_term * (dist_map_c + rgb_eps)

    dist_min_ab = dist_map.min(-1)[0]
    dist_mean_ab = dist_min_ab.mean(-1)

    loss += dist_mean_ab * self.args.chamfer_texture_pixel_loss_wt
    
    ### colect the matched points in the target for visualization
    indices = dist_map.argmin(dim=-1)
    self.matched_pos = torch.stack([vtx_target[i,indices[i]] for i in range(indices.shape[0])],0)
    self.matched_clr = torch.stack([color_target[i,indices[i]] for i in range(indices.shape[0])],0)
    # v2 from: grid sample
    self.matched_clr_v2 = grid_sample_from_vtx(self.matched_pos, target) # NOTE that back vertices color shown as well
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
```
其中的参数：
https://github.com/junzhezhang/mesh-inversion/blob/d6614726344f5a56c068df2750fefc593c4ca43d/lib/arguments.py#L135
```
# loss related
        self._parser.add_argument('--chamfer_mask_loss', action='store_true', default=True, help='if use Chamfer mask loss')
        self._parser.add_argument('--chamfer_mask_loss_wt', type=float, default=10.0)
        self._parser.add_argument('--chamfer_texture_pixel_loss', action='store_true', default=True, help='Chamfer texture loss - pixel level')
        self._parser.add_argument('--chamfer_texture_pixel_loss_wt', type=float, default=1.0)
        self._parser.add_argument('--chamfer_texture_feat_loss', action='store_true', default=True, help='Chamfer texture loss - feature level')
        self._parser.add_argument('--chamfer_texture_feat_loss_wt', type=float, default=0.04)
        self._parser.add_argument('--xy_threshold', type=float, default=0.16)
        self._parser.add_argument('--xy_k', type=float, default=1.0)
        self._parser.add_argument('--xy_alpha', type=float, default=1)
        self._parser.add_argument('--rgb_eps', type=float, default=1)
        self._parser.add_argument('--subpool_threshold', type=float, default=0.5)
        self._parser.add_argument('--chamfer_resolution', type=int, default=8192, help='resolution for computing chamfer texture losses')         
        # other losses
        self._parser.add_argument('--mesh_regularization_loss', action='store_true', default=False, help='')
        self._parser.add_argument('--mesh_regularization_loss_wt', type=float, default=0.00005)
        self._parser.add_argument('--nll_loss', action='store_true', default=True, help='')
        self._parser.add_argument('--nll_loss_wt', type=float, default=0.05)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
```
相关阅读:
基于Python开发的飞机大战小游戏彩色版(源码+可执行程序exe文件+程序配置说明书+程序使用说明书)
小公司需要使用微服务架构吗？
系统安全分析与设计＞网络攻击
 计算机毕业设计ssm+vue基本微信小程序的疫情监控系统
 Python生成随机数字/字符
 0021Java程序设计-SSM框架图书管理系统
 解决OpenCV在Cmake时，因网络问题无法下载部分所需文件
 【Python3】【力扣题】202. 快乐数
 sql数据类型，约束以及单表查询
 【C++】泛型算法（五）泛型算法的使用与设计
原文地址：https://blog.csdn.net/qq_34342853/article/details/128187071

概述

Related Work

Single View 3D Reconstruction

GAN inversion

textured mesh generation

Method

Preliminaries

3.1 Reconstruction with Generative Prior

Pre-training Stage

Inversion Stage

3.2 chamfer texture loss （重点参考）

loss的消融实验

3.3 Chamfer Mask Loss

总loss

Experiments

Texture Flow vs. Texture Regression

实现（主要来自补充材料）

时间，显存，设备GPU

网络结构：和ConvMesh一样。

chamfer texture loss实现笔记

texture CD loss 代码