Microsoft Data Scientist Weighs In on AI Hallucinations -- Pure AI

Microsoft Data Scientist Weighs In on AI Hallucinations

By Pure AI Editors
03/15/2024

An AI system hallucination is a euphemism for an incorrect result. Recent research indicates that AI systems such as OpenAI's ChatGPT chatbot and Google's Gemini (formerly Bard) hallucinate much more frequently than you might guess, and that in many cases, incorrect results are surprisingly difficult to detect.

The problem has persisted since ChatGPT started the generative AI craze, and OpenAI has been working to address it, along with Google's efforts and many others.

DALL-E's Take on 'AI hallucination.' — **[Click on image for larger view.]** DALL-E's Take on "AI Hallucination."

However, no consensus solution to the problem has yet been found.

The Pure AI editors spoke briefly with Dr. James McCaffrey, from Microsoft Research, about the technical aspects of AI hallucinations and possible solutions to the problem.

McCaffrey had recently been researching machine learning hallucinations for a possible article (in addition to his internal publishing efforts, he writes The Data Science Lab column for sister publication Visual Studio Magazine and regularly posts to his personal blog).

He decided not to publish an article, but armed with his recent research, he explained the problem to Pure AI editors.

"When AI natural language systems, such as OpenAI ChatGPT and Google Gemini, generate their output, at each step, the probabilities of every possible next word are computed," McCaffrey said.

For example, if a current output is THE QUICK BROWN ... then the probability that the next word is FOX is very high, likely about 0.98, and the probabilities of other continuations, such as HELICOPTER, will be very small. The AI system usually selects the next word that has the highest probability.

However, an internal setting called the temperature, controls the creativity of the AI output. In ChatGPT, a high temperature value makes the AI system more creative by selecting a continuation with a lower probability. This can lead to factual errors.

In some scenarios, a high creativity setting might have a positive impact. Generating a "thank you" message with a relatively high creativity setting might make the message appear more human.

The problems arise when an AI system is trying to generate factual information, usually in the form of results that contain numbers. Furthermore, when an AI system is asked to provide references for its responses, it's not uncommon for over 50 percent of the references to be inaccurate, with incorrect authors or even references that are complete fabrications.

A recent research result indicates that AI systems have great difficulty interpreting data that is stored in tables. This adds to the hallucination problem when generating factual responses.

Because of the underlying mechanics and software architecture of large language models (LLMs), there are no easy solutions to the problem of AI hallucinations. Currently, the likely best approach is to use a human review of every AI system result that contains factual information in the form of numbers. This underscores the importance of the accuracy of the data used to train AI systems, such as the text of Wikipedia.

Stay tuned for more on this problem and possible solutions because if AI systems someday start training themselves, the idea of information provenance will likely gain increasing importance.