Announcing GPT-NeoX-20B

Announcing GPT-NeoX-20B, a 20 billion parameter model trained in collaboration with CoreWeave.

February 2, 2022 · Connor Leahy

GPT-NeoX-20B will be publicly downloadable from The Eye on the 9th of February. In the meantime, you can already try out the model using CoreWeave’s and Anlatan’s new inference service, GooseAI!

After a year-long odyssey through months of chip shortage-induced shipping delays, technical trials and tribulations, and aggressively boring debugging, we are happy to finally announce EleutherAI’s latest open-source language model: GPT-NeoX-20B, a 20 billion parameter model trained using our GPT-NeoX framework on GPUs generously provided by our friends at CoreWeave.

GPT-NeoX-20B is, to our knowledge, the largest publicly accessible pretrained general-purpose autoregressive language model, and we expect it to perform well on many tasks.

We hope that the increased accessibility of models of this size will aid in research towards the safe use of AI systems, and encourage anyone interested in working in this direction to reach out to us.

As a thank you to our generous compute donors, we are delaying the public downloadable release of the model by 7 days. On February 9, 2022, the full model weights will be downloadable for free under a permissive Apache 2.0 license from The Eye.

There will be a #20b channel set up in our Discord for discussions of this model. Please note that much like our other language models and codebases, GPT-NeoX and GPT-NeoX-20B are very much research artifacts and we do not recommend deploying either in a production setting without careful consideration. In particular, we strongly encourage those looking to use GPT-NeoX-20B to read the paper and datasheet on our training data. There are still bugs to be ironed out and many inefficiencies that could be addressed—but hey, we do this in our free time, give us a break lol

Task Category Babbage Curie GPT-J-6B FairSeq-13B GPT-NeoX-20B DaVinci

LAMBADA Sentence Completion 62.49% 69.51% 68.29% 70.95% 71.98% 75.16%

ANLI R1 Natural Language Inference 32.40% 32.80% 32.40% 34.00% 33.50% 36.30%

ANLI R2 Natural Language Inference 30.90% 33.50% 34.00% 33.00% 34.40% 37.00%

ANLI R3 Natural Language Inference 33.75% 35.50% 35.50% 34.75% 35.75% 36.83%

WSC Coreference Resolution 40.38% 54.81% 36.53% 57.69% 53.61% 63.46%

Winogrande Coreference Resolution 59.51% 64.56% 64.01% 67.40% 65.27% 69.93%

HellaSwag Sentence Completion 54.54% 49.54% 49.54% 55.44% 49.04% 59.18%

Total

39.40% 42.57% 40.28% 44.67% 43.31% 48.40%

Accuracy on standard language modeling tasks.

Subject Group Babbage Curie GPT-J-6B FairSeq-13B GPT-NeoX-20B DaVinci

Humanities 27.01% 26.48% 28.07% 27.27% 28.70% 32.30%

Social Science 27.94% 29.24% 28.73% 27.94% 31.63% 35.87%

STEM 25.83% 24.25% 25.71% 24.63% 26.27% 28.60%

Other 26.86% 28.84% 27.95% 27.33% 29.83% 36.85%

Total 26.78% 26.90% 27.38% 26.53% 28.77% 32.86%

Accuracy of factual knowledge by subject group, as measured by the HendrycksTest evaluation.

Announcing GPT-NeoX-20B

Announcing GPT-NeoX-20B

Recommend

Jay's Blog

【golang】leetcode初级-Fizz Buzz&计数质数

Final Fantasy VI - Ted Woolsey Uncensored Edition

Settings are not a design failure

Competitive programming with AlphaCode

【JUC】线程池ThreadPoolExecutor

Reading on a smartphone affects sigh generation, brain activity, and comprehensi...

csapp之lab：shell lab

2022春节贺岁档电影开分，水门桥不理想，四海崩了！用Python一探究竟

bgo: 具备扩展性的 go 程序构建工具

About Joyk