Ultraman: Single Image 3D Human Reconstruction
with Ultra Speed and Detail

Mingjin Chen^*1,2, Junhao Chen^*1,3, Xiaojun Ye^1,4, Huan-ang Gao¹, Xiaoxue Chen¹, Zhaoxin Fan⁵, Hao Zhao^†1

¹Institute for AI Industry Research (AIR), Tsinghua University
²Beijing Normal University - Hong Kong Baptist University United International College
³Tsinghua Shenzhen International Graduate School, Tsinghua University
⁴College of Computer Science, Zhejiang University
⁵State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences
^*Indicates Equal Contribution ^†Indicates Corresponding Author

Paper Code arXiv

Abstract

3D human body reconstruction has been a challenge in the field of computer vision. Previous methods are often time-consuming and difficult to capture the detailed appearance of the human body. In this paper, we propose a new method called Ultraman for fast reconstruction of textured 3D human models from a single image. Compared to existing techniques, Ultraman greatly improves the reconstruction speed and accuracy while preserving high-quality texture details. We present a set of new frameworks for human reconstruction consisting of three parts, geometric reconstruction, texture generation and texture mapping. Firstly, a mesh reconstruction framework is used, which accurately extracts 3D human shapes from a single image. At the same time, we propose a method to generate a multi-view consistent image of the human body based on a single image. This is finally combined with a novel texture mapping method to optimize texture details and ensure color consistency during reconstruction. Through extensive experiments and evaluations, we demonstrate the superior performance of Ultraman on various standard datasets. In addition, Ultraman outperforms state-of-the-art methods in terms of human rendering quality and speed. Upon acceptance of the article, we will make the code and data publicly available.

Method

Overview of the framework of Ultraman. Ultraman takes a image $I_0$ of human as input. The blue rounded dashed rectangle is denoted as the Prompt Generation Module and the responses to the questions are generated by GPT4V. The red rounded dashed rectangles are indicated as mesh reconstruction modules. It generates the mesh and UV map. The yellow rounded dashed rectangle is the multi-view texture generation module. It includes a control model $\mathcal{M}_c$, which controls the generation of texture in the current viewpoint by accepting the prompt from the current viewpoint, and by using the depth map rendered by the mesh and the input image. The Texturing module pastes the corresponding texture back onto the mesh according to the generation mask.

DIY clothes

You can make DIY designs for your clothes according to your needs.

Original input

Output result

Pattern DIY

Output result

Text DIY

Output result

Ultraman: Single Image 3D Human Reconstruction
with Ultra Speed and Detail

Without any 3D or 2D pre-training, our proposed Ultraman is able to quickly synthesize complete, realistic and highly detailed 3D avatars based on a single input RGB image.

Abstract

Method

Visual Comparisons

Animated Avatars

DIY clothes

You can make DIY designs for your clothes according to your needs.

Citation

Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail

Without any 3D or 2D pre-training, our proposed Ultraman is able to quickly synthesize complete, realistic and highly detailed 3D avatars based on a single input RGB image.

Abstract

Method

Visual Comparisons

Animated Avatars

DIY clothes

You can make DIY designs for your clothes according to your needs.

Citation

Ultraman: Single Image 3D Human Reconstruction
with Ultra Speed and Detail