Introduction

We introduce the Pathways Autoregressive Text-to-Image model (Parti), an autoregressive text-to-image generation model that achieves high-fidelity photorealistic image generation and supports content-rich synthesis involving complex compositions and world knowledge. Recent advances with diffusion models for text-to-image generation, such as Google’s Imagen, have also shown impressive capabilities and state-of-the-art performance on research benchmarks. Parti and Imagen are complementary in exploring two different families of generative models – autoregressive and diffusion, respectively – opening exciting opportunities for combinations of these two powerful models.

Parti treats text-to-image generation as a sequence-to-sequence modeling problem, analogous to machine translation – this allows it to benefit from advances in large language models, especially capabilities that are unlocked by scaling data and model sizes. In this case, the target outputs are sequences of image tokens instead of text tokens in another language. Parti uses the powerful image tokenizer, ViT-VQGAN, to encode images as sequences of discrete tokens, and takes advantage of its ability to reconstruct such image token sequences as high quality, visually diverse images.

We observed the following results:

Consistent quality improvements by scaling Parti’s encoder-decoder up to 20 billion parameters.
State-of-the-art zero-shot FID score of 7.23 and finetuned FID score of 3.22 on MS-COCO.
Effectiveness across a wide variety of categories and difficulty aspects in our analysis on Localized Narratives and PartiPrompts, our new holistic benchmark of 1600+ English prompts that we release as part of this work.

We also explore and highlight limitations of our models, giving key example areas of focus for further improvements.

Pathways Autoregressive Text-to-Image Model

Introduction

Recommend

What was the first computer virus released in the wild?

For Retailers, a Direct Link from ‘Buy Now, Pay Later’ to the Bottom Line

厂商狂清显卡库存：均价降低10％以上，便宜了还是有人买

MediaTek unveils the Dimensity 9000+ chip for flagship 5G smartphones

SMTP restricted by default

flutter系列之:flutter中的Wrap

Conquering 'shadow IT': How enterprises are trying to tame the cloud software be...

Parameterized (data-driven) Tests in Vitest + example

游戏订阅，救不了微软和索尼的「主机危机」

Gen Z is Taking Over

About Joyk