Generating Texture for 3D Human Avatar from a Single Image using Sampling and Refinement Networks

There has been significant progress in generating an animatable 3D human avatar from a single image. However, recovering texture for the 3D human avatar from a single image has been relatively less addressed. Because the generated 3D human avatar reveals the occluded texture of the given image as it moves, it is critical to synthesize the occluded texture pattern that is unseen from the source image. To generate a plausible texture map for 3D human avatars, the occluded texture pattern needs to be synthesized with respect to the visible texture from the given image. Moreover, the generated texture should align with the surface of the target 3D mesh.

In this paper, we propose a texture synthesis method for a 3D human avatar that incorporates geometry information. The proposed method consists of two convolutional networks for the sampling and refining process. The sampler network fills in the occluded regions of the source image and aligns the texture with the surface of the target 3D mesh using the geometry information. The sampled texture is further refined and adjusted by the refiner network. To maintain the clear details in the given image, both sampled and refined texture is blended to produce the final texture map. To effectively guide the sampler network to achieve its goal, we designed a curriculum learning scheme that starts from a simple sampling task and gradually progresses to the task where the alignment needs to be considered. We conducted experiments to show that our method outperforms previous methods qualitatively and quantitatively.

Overview of the proposed method. The source image is processed to create partial texture, visibility mask, and normal map which are given as an input to SamplerNet. SamplerNet predicts a sampling grid that is used for producing the sampled texture. RefinerNet receives the sampled texture and occlusion mask as input, and generates a refined texture and blending mask. The final output is produced by alpha blending the sampled texture with the refined texture using the blending mask.

SamplerNet fills in the occluded regions of the source image and aligns the texture with the surface of the target 3D mesh using the geometry information.

After SamplerNet completes the partial texture, the resulting texture map is refined using RefinerNet followed by texture blending process.

BibTeX


    @article {10.1111:cgf.14769,
    journal = {Computer Graphics Forum},
    title = {{Generating Texture for 3D Human Avatar from a Single Image using Sampling and Refinement Networks}},
    author = {Cha, Sihun and Seo, Kwanggyoon and Ashtari, Amirsaman and Noh, Junyong},
    year = {2023},
    publisher = {The Eurographics Association and John Wiley & Sons Ltd.},
    ISSN = {1467-8659},
    DOI = {10.1111/cgf.14769}
    }

Generating Texture for 3D Human Avatar from a Single Image using Sampling and Refinement Networks

Given a single image, our method generates a texture map with respect to the target 3D human mesh using sampling and refinement strategies.

Abstract

Overview

Methods

SamplerNet

RefinerNet

Example Results

BibTeX