
image-3D object pairs [46,35,32,39]
multi-view images [33,28,51,47,34]
SMPL for humans and 3DMM for faces [8,40,18],
CMR [19] reconstructs category-specific
textured mesh
texture一般有两种方法,一个是direct regression of pixel values in the UV texture map – often blurry 但作者用的这个。
主流方法是learning the texture flow,对novel view的泛化能力不好。
GAN inversion 是指先训练好一个GAN,然后找到合适的z,使得z输入GAN以后得到的输出尽可能满足要求。
通常可以用
梯度下降(略)
用一个encoder来学:
Bau, D., Strobelt, H., Peebles, W., Zhou, B., Zhu, J.Y., Torralba, A., et al.: Semantic photo manipulation with a generative image prior. In: SIGGRAPH (2019)
或者二者的结合:
Zhu, J., Shen, Y., Zhao, D., Zhou, B.: In-domain GAN inversion for real image
editing. In: ECCV (2020)
3D领域最新的工作,包括用GAN Inversion进行点云补全:
Zhang, J., Chen, X., Cai, Z., Pan, L., Zhao, H., Yi, S., Yeo, C.K., Dai, B., Loy, C.C.:
Unsupervised 3D shape completion through GAN inversion. In: CVPR (2021)
6.Learning to predict 3D objects with an interpolation-based differentiable renderer.
In: NeurIPS (2019)
重建的mesh可微渲染之后,用渲染得到的multi view images做discriminaive 监督
13.Leveraging 2D data to learn textured
3D mesh generation. In: CVPR (2020)
VAE 方法,face colors instead of texture maps
38.Convolutional generation of textured 3D meshes
topology-aligned texture maps and deformation maps in the UV space. (本文就用了他的pretrained model)
看起来大体方法是用Generator从latent code生成geometry和texture,然后用chamfer mask loss和chamfer texture loss来监督。
mesh表示为O = (V,F,T), 即点,面,texture map。
其中,由于
An individual mesh is iso-morphic to a 2-pole sphere.
因此点的位置可用球体的deformation
Δ
V
\Delta \mathbf{V}
ΔV表示:
V
=
V
s
p
h
e
r
e
+
Δ
V
\mathbf{V} = \mathbf{V}_{sphere} + \Delta \mathbf{V}
V=Vsphere+ΔV
以前的方法大多用MLP来regress delta V,本文使用CNN。
渲染时,使用弱透视投影。(区别于透视投影和正交投影的一种投影方法),参数为π, 包含scale s, translation t和rotation r。
由于这个相机位置是不断oprimize的,image不可能完美对齐,需要一个鲁棒的texture loss,见下文

这个东西挺有用的,请看消融实验:

除了pixel level的CD loss,还有feature level的CD loss:
Specifically, we apply the Chamfer texture loss between the (foreground) feature maps extracted with a pre-trained VGG-19 network [42] from the rendered image and the input image.
这一点有点像contextual loss (The contextual loss for image transformation with non-aligned data.),但有点区别。
feature level Chamfer texture loss: 考虑location,但不要求完全对齐;
contextual loss:完全不考虑location。

CT是指chamfer texture loss;
LpCT是pixel level的; LfCT是feature level的。
看中间那三行,可以看到,
contextual是最差的,
其次是只有L1;
L1 + perceptual好一点;
最好的还是CT loss
等下仔细看看代码,尤其是这个latent space loss。
以及那个feature level是咋搞啊。
Texture Flow 更常用,但在invisible的地方容易出错;因为容易copy foreground pixies including the obstacles.
Pre-training:
600 epochs, with a batch size of 128,
15 hours on four Nvidia V100 GPUs.

mesh_inversion.py
https://github.com/junzhezhang/mesh-inversion/blob/d6614726344f5a56c068df2750fefc593c4ca43d/lib/mesh_inversion.py#L265
if self.args.chamfer_texture_pixel_loss:
# NOTE: batch size should be one
pix_pos_pred = mask2proj(mask_pred)
pix_pred = grid_sample_from_vtx(pix_pos_pred, image_pred)
dist_map_c, idx_a, idx_b = distChamfer_downsample(pix_pred,color_target,resolution=self.args.chamfer_resolution)
dist_map_p, _, _ = distChamfer_downsample(pix_pos_pred,vtx_target,resolution=self.args.chamfer_resolution, idx_a=idx_a, idx_b=idx_b)
xy_threshold = self.args.xy_threshold
k = self.args.xy_k
alpha = self.args.xy_alpha
eps = 1 - (2*k*xy_threshold)**2
rgb_eps = self.args.rgb_eps
if eps == 1:
xy_term = torch.pow(1+k*dist_map_p, alpha)
else:
xy_term = F.relu(torch.pow(eps+k*dist_map_p, alpha)-1) + 1
dist_map = xy_term * (dist_map_c + rgb_eps)
dist_min_ab = dist_map.min(-1)[0]
dist_mean_ab = dist_min_ab.mean(-1)
loss += dist_mean_ab * self.args.chamfer_texture_pixel_loss_wt
### colect the matched points in the target for visualization
indices = dist_map.argmin(dim=-1)
self.matched_pos = torch.stack([vtx_target[i,indices[i]] for i in range(indices.shape[0])],0)
self.matched_clr = torch.stack([color_target[i,indices[i]] for i in range(indices.shape[0])],0)
# v2 from: grid sample
self.matched_clr_v2 = grid_sample_from_vtx(self.matched_pos, target) # NOTE that back vertices color shown as well
其中的参数:
https://github.com/junzhezhang/mesh-inversion/blob/d6614726344f5a56c068df2750fefc593c4ca43d/lib/arguments.py#L135
# loss related
self._parser.add_argument('--chamfer_mask_loss', action='store_true', default=True, help='if use Chamfer mask loss')
self._parser.add_argument('--chamfer_mask_loss_wt', type=float, default=10.0)
self._parser.add_argument('--chamfer_texture_pixel_loss', action='store_true', default=True, help='Chamfer texture loss - pixel level')
self._parser.add_argument('--chamfer_texture_pixel_loss_wt', type=float, default=1.0)
self._parser.add_argument('--chamfer_texture_feat_loss', action='store_true', default=True, help='Chamfer texture loss - feature level')
self._parser.add_argument('--chamfer_texture_feat_loss_wt', type=float, default=0.04)
self._parser.add_argument('--xy_threshold', type=float, default=0.16)
self._parser.add_argument('--xy_k', type=float, default=1.0)
self._parser.add_argument('--xy_alpha', type=float, default=1)
self._parser.add_argument('--rgb_eps', type=float, default=1)
self._parser.add_argument('--subpool_threshold', type=float, default=0.5)
self._parser.add_argument('--chamfer_resolution', type=int, default=8192, help='resolution for computing chamfer texture losses')
# other losses
self._parser.add_argument('--mesh_regularization_loss', action='store_true', default=False, help='')
self._parser.add_argument('--mesh_regularization_loss_wt', type=float, default=0.00005)
self._parser.add_argument('--nll_loss', action='store_true', default=True, help='')
self._parser.add_argument('--nll_loss_wt', type=float, default=0.05)