OpenAI develops LLM that uses a chain of thought like humans

OpenAI has released a new paper outlining some advancements it has made in eliminating the common problem of hallucinations where AI just makes stuff up. The paper outlines two models called outcome supervision and process supervision to weed out hallucinations and how they perform.

With outcome supervision, OpenAI trains reward models to provide feedback on the final result the AI gives. With process supervision, the reward model provides feedback at every step of the way, creating a human-like chain of thought.

In its research paper, OpenAI tested both models on a math dataset and found that the process supervision method led to “significantly better performance”. It’s important to note that the process supervision method has only been tested in the area of mathematics so far and that it will take more work to see how it performs more generally.

$A chart showing that process supervision has a higher success rate than outcome supervision at solvi$

Explaining the possible outcomes of the process supervision method, OpenAI said:

“If these results generalize, we may find that process supervision gives us the best of both worlds – a method that is both more performant and more aligned than outcome supervision.”

It’s still too early to say how much this step-by-step verification will help to address hallucinations more generally, but hopefully, it will because hallucinations are probably the number one issue with LLMs right now. Just this week, a lawyer that had used ChatGPT for his work and submitted false information detailing fake cases that the AI had dreamt up.

OpenAI has not given a timeline for how long it will take to implement process supervision in ChatGPT which is available to the public. It’s still in the research phase and needs to be tested on general information.

While initial results are good, OpenAI does mention that safer methods can incur reduced performance called an alignment tax. The results show so far that process supervision doesn’t incur this tax while working on math problems but we don’t know what will happen on more general information.

OpenAI develops LLM that uses a chain of thought like humans

Recommend

腾讯股票现在能买了？

“北上广深”迎来人口负增长, 有哪些原因？

大佬们有开源的项目可以参与的吗？

Live in 60 Minutes: Creating an Automated Deployment from Scratch

雷诺韩国计划创建元宇宙汽车体验展厅

城市产业研究札记 —— 聊城市（下篇）

首届中国数字艺术大展面向全国征稿，涵盖 AIGC 等与数字技术深度融合的作品

Rufus 4.1 restores some missing features, adds improved compatibility

英国首相：希望使英国成为全球 AI 中心，将引领制定“安全可靠”规则

Generative AI: Incumbents vs. Upstarts

About Joyk