14

MegaPortraits: One-shot Megapixel Neural Head Avatars

 1 year ago
source link: https://samsunglabs.github.io/MegaPortraits/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Abstract

In this work, we advance the neural head avatar technology to the megapixel resolution while focusing on the particularly challenging task of cross-driving synthesis, i.e., when the appearance of the driving image is substantially different from the animated source image. We propose a set of new neural architectures and training methods that can leverage both medium-resolution video data and high-resolution image data to achieve the desired levels of rendered image quality and generalization to novel views and motion. We show that suggested architectures and methods produce convincing high-resolution neural avatars, outperforming the competitors in the cross-driving scenario. Lastly, we show how a trained high- resolution neural avatar model can be distilled into a lightweight student model which runs in real-time and locks the identities of neural avatars to several dozens of pre-defined source images. Real-time operation and identity lock are essential for many practical applications head avatar systems.

Main scheme

method.png

We propose a system for the one-shot creation of high-resolution human avatars, called megapixel portraits or MegaPortraits for short. Our model is trained in two stages. Optionally, we propose an additional distillation stage for faster inference.

Our training setup is relatively standard. We sample two random frames from our dataset at each step: the source frame and the driver frame. Our model imposes the motion of the driving frame (i.e., the head pose and the facial expression) onto the appearance of the source frame to produce an output image. The main learning signal is obtained from the training episodes where the source and the driver frames come from the same video, and hence our model’s prediction is trained to match the driver frame.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK