According to a Purdue University research, ChatGPT chatbot delivers incorrect responses to software programming queries for over a half requests. Nonetheless, the bot was convincing enough to mislead one-third of those who queried ChatGPT.

The Purdue University team inspected ChatGPT's replies to 517 Stack Overflow queries to determine their accuracy, consistency, comprehensiveness, and conciseness. The researchers also conducted language and sentiment analysis on the responses and questioned a dozen volunteer participants about the results obtained from the model.

"Our analysis shows that 52 percent of ChatGPT answers are incorrect and 77 percent are verbose," the team's paper concluded. "Nonetheless, ChatGPT answers are still preferred 39.34 percent of the time due to their comprehensiveness and well-articulated language style." 77% of the chosen ChatGPT responses were incorrect.

The Prude University

According to OpenAI statement on their ChatGPT website their algorithm "may produce inaccurate information about people, places, or facts." So the lab was questioned to see if it had any thoughts on the Purdue study.

The title of the pre-print paper is "Who Answers It Better?" A comprehensive analysis of ChatGPT and Stack Overflow replies to software engineering problems. It was written by researchers David Udo-Imeh, Bonan Kou, Samia Kabir, and Tianyi Zhang, an assistant professor.

According to the research, "from semi-structured interviews, it is apparent that polite language, articulated and text-book style answers, comprehensiveness, and affiliation in answers make completely wrong answers seem correct."

The research discovered that two of the twelve participants continued to choose the incorrect response as preferred. According to the article, this is due to ChatGPT's pleasant, authoritative writing style.

"During our study, we observed that only when the error in the ChatGPT answer is obvious, users can identify the error," their paper stated. "However, when the error is not readily verifiable or requires external IDE or documentation, users often fail to identify the incorrectness or underestimate the degree of error in the answer."

"Participants ignored the incorrectness when they found ChatGPT’s answer to be insightful. The way ChatGPT confidently conveys insightful information (even when the information is incorrect) gains user trust, which causes them to prefer the incorrect answer."

One of the key reasons was the level of depth in ChatGPT's responses. In many situations, participants were okay with the length as long as they received relevant information from the lengthy and thorough responses. The other two factors were favorable feelings and the courtesy of the responses.

The scientists discovered, among other things, that ChatGPT is more prone to make conceptual mistakes than factual ones. The article discovered that "Many answers are incorrect due to ChatGPT’s incapability to understand the underlying context of the question being asked".