Diffusion probabilistic models have been successfully adapted to generate 3D point clouds and achieved impressive fidelity. However, existing methods can only generate a single point cloud from noise, leaving joint generation and conditional generation elusive to grasp. These two demands originate in various real-world problems and we take dressed avatar generation as an example. Specifically speaking, directly generating a dressed avatar as a single point cloud cannot meet the demand of changing garment, and generating garment and avatar as two point clouds inevitably leads to mismatch. Meanwhile, generating matched garment for an undressed avatar is not possible for current diffusion techniques. To this end, we present Hoodie which is the first method that successfully resolves aforementioned issues. Technically, Hoodie first trains two separate point cloud diffusion models with global latent codes, then trains a latent code diffusion model for the concatenation of human and garment latents. This hierarchical architecture not only supports the joint generation of human and matched garment, but also supports conditional inference that generates matched garment given a 3D human input. Besides, we integrate a point cloud upsampling GAN to improve the uniformity of generated point clouds. Large-scale quantitative and qualitative evaluations show that Hoodie achieves strong performance on aforementioned new tasks it enables. Code, data and models will be publicly available.
Method | Cityscapes | ADE20K | ||||
---|---|---|---|---|---|---|
mIoU ↑ | Acc ↑ | FID ↓ | mIoU ↑ | Acc ↑ | FID ↓ | |
Normal Prior | 65.14 (+0.00) | 94.14 (+0.00) | 23.35 (+0.00) | 20.73 (+0.00) | 61.14 (+0.00) | 20.58 (+0.00) |
Spatial Prior | 66.77 (+1.63) | 94.29 (+0.15) | 12.83 (-10.52) | 20.86 (+0.13) | 64.46 (+3.32) | 16.03 (-4.55) |
Categorical Prior | 66.86 (+1.72) | 94.54 (+0.40) | 11.63 (-11.72) | 21.86 (+1.13) | 66.63 (+5.49) | 16.56 (-4.02) |
Joint Prior | 67.92 (+2.78) | 94.65 (+0.51) | 10.53 (-12.82) | 25.61 (+4.88) | 71.79 (+10.65) | 12.66 (-7.92) |
@article{gao2024scp,
title={SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior},
author={Gao, Huan-ang and Gao, Mingju and Li, Jiaju and Li, Wenyi and Zhi, Rong and Tang, Hao and Zhao, Hao},
journal={arXiv preprint arXiv:2403.09638},
year={2024}
}