Researchers Gain New Understanding From Simple AI

Language processing programs are notoriously hard to interpret, but smaller versions can provide important insights into how they work.

Read Later

Avalon Nuovo for Quanta Magazine

In the last two years, artificial intelligence programs have reached a surprising level of linguistic fluency. The biggest and best of these are all based on an architecture invented in 2017 called the transformer. It serves as a kind of blueprint for the programs to follow, in the form of a list of equations.

But beyond this bare mathematical outline, we don’t really know what transformers are doing with the words they process. The popular understanding is that they can somehow pay attention to multiple words at once, allowing for an immediate “big picture” analysis, but how exactly this works — or if it’s even an accurate way of understanding transformers — is unclear. We know the ingredients, but not the recipe.

Now, two studies by researchers from the company Anthropic have started to figure out, fundamentally, what transformers are doing when they process and generate text. In their first paper, released in December, the authors look at simplified versions of the architecture and fully explain how they function. “They give a very nice characterization of how they work in the very simple case,” said Yonatan Belinkov of the Technion in Haifa, Israel. “I’m very positive about this work. It’s interesting, promising, kind of unique and novel.”

Abstractions navigates promising ideas in science and mathematics. Journey with us and join the conversation.

See all Abstractions blog

The authors also show that simple transformers go from learning basic language patterns to picking up a general ability for language processing. “You see that there is this leap in competence,” said Martin Wattenberg of Harvard University. The authors “are starting to decipher the recipe.”

In their second paper, posted March 8, the researchers show that the same components responsible for this ability are also at play in the most complex transformers. While the mathematics of those models remains largely impenetrable, the results offer an inroad to understanding. “The thing they found in toy models translates to the larger models,” said Connor Leahy of the company Conjecture and the research group EleutherAI.

The difficulty in understanding transformers lies in their abstraction. Whereas a conventional program follows an understandable process, like outputting the word “grass” whenever it sees the word “green,” a transformer converts the word “green” into numbers and then multiplies them by certain values. These values (also called parameters) dictate what the next word will be. They get fine-tuned during a process called training, where the model learns how to produce the best outputs, but it’s unclear what the model is learning.

Most machine learning programs package their math into modular ingredients called neurons. Transformers incorporate an additional type of ingredient, called an attention head, with sets of heads arranged in layers (as are neurons). But heads perform distinct operations from neurons. The heads are generally understood as allowing a program to remember multiple words of input, but that interpretation is far from certain.

“The attention mechanism works, clearly. It’s getting good results,” said Wattenberg. “The question is: What is it doing? My guess is it’s doing a whole lot of stuff that we don’t know.”

To better understand how transformers work, the Anthropic researchers simplified the architecture, stripping out all the neuron layers and all but one or two layers of attention heads. This let them spot a link between transformers and even simpler models that they fully understood.

It’s doing something that looks a little more like abstract reasoning.

Nelson Elhage, Anthropic

Consider the simplest possible kind of language model, called a bigram model, which reproduces basic language patterns. For example, while being trained on a large body of text, a bigram model will note what word follows the word “green” most often (such as “grass”) and memorize it. Then, when generating text, it will reproduce the same pattern. By memorizing an associated follow-up word for every input word, it gains a very basic knowledge of language.

The researchers showed that a transformer model with one layer of attention heads does something similar: It reproduces what it memorizes. Suppose you give it a specific input, like “Doctor Smith went to the store because Doctor …” This input is called the prompt, or context. For us, the next word is obvious — Smith.

An attention head in a trained one-layer model can make this prediction in two steps. First, it looks at the final word in the context (Doctor) and searches for a specific word in the context that it has learned (during training) to associate with the final word. Then, for any word that it finds, it looks up another word that it has learned to associate with the found word, as in the bigram model. (This can be the same word.) It then moves this associated word to the model’s output.

For this example, the researchers show that based on the final word, “Doctor,” the head knows from its training to search for a word that is a common name. On finding the name “Smith” earlier in the sentence, the head looks at what it’s learned to associate with “Smith” and moves that word to the output. (In this case, the model has learned to associate the same word “Smith” with the found word “Smith.”) The net effect of the overall process is that the model copies the word “Smith” from the context to the output.

Researchers Glimpse How AI Gets So Good at Language Processing | Quanta Magazine

Researchers Gain New Understanding From Simple AI

See all Abstractions blog

Recommend

从火了10年的《甄嬛传》里，看出哪些职场之道

在考研竞争日趋激烈的背景下，选择冲名校还是稳一点考本校？

【冯站长之家】2022年4月16日（周六）三分钟新闻早餐

Prompt learning系列之answer engineering(二) 自动搜索篇

One-time pad

原创丨俄对美北约发出最强警告！已做好最坏打算

因疫情封锁，苹果iPhone产量将减少600万-1000万台

感觉良好！感觉良好！感觉良好！

媒体称「普京签署俄公司从外国股市退市的法令」，这可能出于哪些考虑？将产生哪些影响...

🧑‍💻 emojifier

About Joyk