Purpose

Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.

Avoid being a link dump. Try to provide only valuable well tuned information.

Prelude

Neural network links before starting with transformers.

= easy, = medium, = hard
= long, = low quality audio

Youtube Lessons

Łukasz Kaiser Attention is all you need; Attentional Neural Network Models This talk is from 6 years ago.
Andrej Karpathy The spelled-out intro to language modeling: building makemore: basic. bi-gram name generator model by counting, then by NN. using pytorch.
Andrej Karpathy Building makemore Part 2: MLP:
Andrej Karpathy Building makemore Part 3: Activations & Gradients, BatchNorm):
Andrej Karpathy Building makemore Part 4: Becoming a Backprop Ninja:
Hedu AI Visual Guide to Transformer Neural Networks - (Episode 1) Position Embeddings: Tokens are embedded into a semantic space. sine/cosine position encoding explained very well.
Hedu AI Visual Guide to Transformer Neural Networks - (Episode 2) Multi-Head & Self-Attention: Clear overview of multi-head attention.
Hedu AI Visual Guide to Transformer Neural Networks - (Episode 3) Decoder’s Masked Attention: Further details on the transformer architecture.
Andrej Karpathy Andrej Karpathy - Let's build GPT: from scratch, in code, spelled out.: build up a Shakespeare gpt-2-like from scratch. starts with bi-gram and adds features one by one. pytorch.
Chris Olah CS25 I Stanford Seminar - Transformer Circuits, Induction Heads, In-Context Learning: Interpretation. Deep look into the mechanics of induction heads. Companion article
Jay Alammar The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning
Jay Alammar How GPT3 Works - Easily Explained with Animations: extremely high level basic overview.
Jay Alammar The Narrated Transformer Language Model: much deeper look at the architecture. goes into detail. Companion article.
Sebastian Raschka L19: Self-attention and transformer networks Academic style lecture series on self-attention transformers
Mark Chen Transformers in Language: The development of GPT Models including GPT3 A chunk of this lecture is about applying GPT to images. Same lecture series as the Chris Olah one. Rest of the series. Papers listed in the talk:

Articles

Research Paper Lists

Sebastian Raschka Understanding Large Language Models -- A Transformative Reading List This article lists some of the most important papers in the area. This is a really good chronological list of papers.
OpenAI Research Index

Research Papers

(GPT-1) Radford et. al. Improving Language Understanding by Generative Pre-Training (2018) a page accompanying this paper on the OpenAI blog Improving language understanding with unsupervised learning. Source code (tidied up by thomwolf) here: huggingface.co/.../openai-gpt
(GPT-2) Radford et. al. Language Models are Unsupervised Multitask Learners (2019) accompanying OpenAI blog Improving language understanding with unsupervised learning. Source code here: github.com/openai/gpt-2
(GPT-3) Brown et. al. Language Models are Few-Shot Learners
Kaplan et. al. Scaling Laws for Neural Language Models A variety of models were trained using varying amounts of compute, data set size, and number of parameters. This enables us to predict what parameters will work well in larger future models. See also Gwern Branwen The Scaling Hypothesis
Mary Phuong et. al. Formal Algorithms for Transformers This paper gives pseudocode for various versions of the transformer (with array indexes starting at 1 for some reason). Very useful reference to have.

Philosophy of GPT

Isaac Asimov The Last Question (1956)
Justin Weinberg, Daily Nous Philosophers On GPT-3
Fernando Borretti And Yet It Understands
Ted Chiang ChatGPT Is a Blurry JPEG of the Web
Noam Chomsky The False Promise of ChatGPT
Janus Simulators This is a long post but the main point you can take from it is that LLMs act as simulators that can create many different personas to generate text. Related, easier to read and understand Janus' Simulators
Julian Togelius Is Elden Ring an existential risk to humanity? Satire. This leads into a critique of the concept of intelligence.

Usage

Chip Huyen Building LLM applications for production How to get good results from actually using an LLM.

GPT/LLM Link Collections

Random fun/interesting Applications

https://github.com/PrefectHQ/marvin - implement entire python functions just by describing them in a comment
https://github.com/pgosar/ChatGDB - GDB debugger commands using natural language
https://github.com/TheR1D/shell_gpt - type things like "list files" instead of "ls"
https://github.com/RomanHotsiy/commitgpt - create git commit messages
https://github.com/densmirnov/git2gpt/commits/main - create git commits from repo + prompts, mutating a codebase over time
https://www.chatpdf.com/ - Upload a PDF and discuss it.
https://ggpt.43z.one/ - prompt injection golfing game
https://www.debate-devil.com/en - devils advocate debate game

ConLang stuff

Dylan Black I Taught ChatGPT to Invent a Language Gloop splog slopa slurpi
Ryszard Szopa Teaching ChatGPT to Speak my Son’s Invented Language hingadaa’ng’khuu’ngkilja’khłattama’khattama

This page is not finished yet. I will continue adding to this.

LLM Introduction: Learn Language Models

Purpose

Prelude

Youtube Lessons

Articles

Research Paper Lists

Research Papers

Philosophy of GPT

Usage

GPT/LLM Link Collections

Random fun/interesting Applications

ConLang stuff

Recommend

Angry Birds: Sega agrees to buy video game maker Rovio

Agile Manchester, Manchester, UK, May 10-12 2023

怎么用国内银行卡收美金（工资）？

播客 | 后厂女工小王：辞掉产品经理，开启自由职业，期待好事发生

Designing AI Products with User Experience in Mind: Top UX Principles to Follow...

MySQL 5.7 的支援只到今年十月 (Oct 2023)

when trees fall...

DC's James Gunn Announces Superman Legacy Is in Preproduction

京东现在真是拿用户不当人了

三星发布Galaxy M14 5G手机搭载基于安卓13的One UI 5.1系统

About Joyk