ChatGPT gets more than half the programming questions wrong in recent study

But ChatGPT's confidence and politeness convince some people it's right

By Rob Thubron Today 8:08 AM 10 comments

ChatGPT gets more than half the programming questions wrong in recent study

TechSpot is celebrating its 25th anniversary. TechSpot means tech analysis and advice you can trust.

Facepalm: Generative AIs often get things wrong – even their makers don't hide this fact – which is why using them to help create code isn't a good idea. To test ChatGPT's general abilities and knowledge in this area, the system was asked a large number of software programming questions, more than half of which it got wrong. However, it still managed to fool a significant number of people.

A study from Purdue University (via The Reg) involved asking ChatGPT 517 Stack Overflow questions and asking a dozen volunteer participants about the results. The answers were assessed not only on whether they were correct, but also on their consistency, comprehensiveness, and conciseness. The team also analyzed the linguistic style and sentiment of the responses.

It wasn't a good showing for ChatGPT. OpenAI's tool answered just 48% of the questions correctly, while 77% were described as "verbose."

What's especially interesting is that ChatGPT's comprehensiveness and well-articulated language style meant that almost 40% of its answers were still preferred by the participants. Unfortunately for the generative AI, 77% of those preferred answers were wrong.

"During our study, we observed that only when the error in the ChatGPT answer is obvious, users can identify the error," states the paper, written by researchers Samia Kabir, David Udo-Imeh, Bonan Kou, and assistant professor Tianyi Zhang. "However, when the error is not readily verifiable or requires external IDE or documentation, users often fail to identify the incorrectness or underestimate the degree of error in the answer."

Even when ChatGPT's answer was obviously wrong, two out of the 12 participants still preferred it due to the AI's pleasant, confident, and positive tone. Its comprehensiveness and the textbook style of writing also contributed to making a factually incorrect answer appear correct in some people's eyes.

"Many answers are incorrect due to ChatGPT's incapability to understand the underlying context of the question being asked," the paper explains.

Generative AI makers include warnings on their products' pages about the answers they give potentially being wrong. Even Google has warned its employees about the dangers of chatbots, including its own Bard, and to avoid directly using code generated by these services. When asked why, the company said that Bard can make undesired code suggestions, but it still helps programmers. Google also said it aimed to be transparent about the limitations of its technology. Apple, Amazon, and Samsung, meanwhile, are just some of the firms to have banned ChatGPT completely.

ChatGPT gets more than half the programming questions wrong in recent study | Te...

ChatGPT gets more than half the programming questions wrong in recent study

But ChatGPT's confidence and politeness convince some people it's right

Recommend

$27 Million PayPal Tokenized Dollars Issued on the Ethereum Blockchain

Blast Off Into the Future: MozCon 2023 Day One Recap

How to Know You’re Doing Great at UI/UX

Even Zoom wants employees back in office - The Washington Post

小明投影斩获2023上半年LCD投影销售额第一引领LCD投影市场大爆发

HarmonyOS应用联运服务，构筑鸿蒙生态商业模式基石-品玩

Design system adoption: How to make the team use the design system

MacBookPro (2017) frequently won't charge above 89%

Bees and ETRVs: An unlikely match-up of the natural world and electric trackless...

英特尔酷睿i9-14900K/i7-14700K首次现身基准测试，ES芯片开始进行平台验证

About Joyk