8

Pathways Autoregressive Text-to-Image Model

 2 years ago
source link: https://parti.research.google/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Research paper   GitHub repository

Introduction

We introduce the Pathways Autoregressive Text-to-Image model (Parti), an autoregressive text-to-image generation model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge. Recent advances with diffusion models for text-to-image generation, such as Google’s Imagen, have also shown impressive capabilities and state-of-the-art performance on research benchmarks. Parti and Imagen are complementary in exploring two different families of generative models – autoregressive and diffusion, respectively – opening exciting opportunities for combinations of these two powerful models.

Parti treats text-to-image generation as a sequence-to-sequence modeling problem, analogous to machine translation – this allows it to benefit from advances in large language models, especially capabilities that are unlocked by scaling data and model sizes. In this case, the target outputs are sequences of image tokens instead of text tokens in another language. Parti uses the powerful image tokenizer, ViT-VQGAN, to encode images as sequences of discrete tokens, and takes advantage of its ability to reconstruct such image token sequences as high quality, visually diverse images.

We observed the following results:

  • Consistent quality improvements by scaling Parti’s encoder-decoder up to 20 billion parameters.
  • State-of-the-art zero-shot FID score of 7.23 and finetuned FID score of 3.22 on MS-COCO.
  • Effectiveness across a wide variety of categories and difficulty aspects in our analysis on Localized Narratives and PartiPrompts, our new holistic benchmark of 1600+ English prompts that we release as part of this work.
We also explore and highlight limitations of our models, giving key example areas of focus for further improvements.
parti_overview.jpg

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK