
Deadly errors related to health information have forced
Google to remove answers in AI Overviews queries.
Data provided incorrect dietary advice for users looking for information on liver function (lf) and pancreatic cancer -- inaccurate
information that could lead to malnutrition, according to a report.
In one case, the information about crucial liver-function tests could have led to people developing liver
disease while they incorrectly believed they were healthy.
The query, “what is the normal range for liver blood tests,” The Guardian found, served up context that did not
account for nationality, sex, ethnicity or the age of patients. In other words, incomplete information could not have possibly served up an accurate response.
Typing slight variations of the
original queries into Google, such as “lft reference range” or “lft test reference range,” continued to prompt AI Overviews to provide answers.
advertisement
advertisement
“We do not comment
on individual removals within Search,” a Google spokesperson told The
Guardian. In cases where AI Overviews miss some context, we work to make broad improvements, and we also take action under our policies where appropriate.”
Other types of errors
were uncovered about the same time that OpenAI introduced its
Healthcare ChatGPT, but they are not restricted to health queries, although they are much more dangerous to humans.
Large language models are being trained on contaminated data. Tal Jacobson,
CEO at Perion Network, which connects advertisers across all channels, experienced misinformation about himself. Based on a query, the chatbot accused Jacobson of being charged with fraud as the
company's CFO -- a title he never held.
The chatbot claimed it happened about about three years ago when OpenAI's chat was very new, and then again more recently, based on information about
his wife.
After the most recent episode, he realized “AI models like ChatGPT aren’t optimized for honesty,” Jacobson told MediaPost. “They’re optimized to
score.”
When AI doesn't know an answer, Jacobson said, it faces a choice to either say “I don't know” and get a zero score, or guess and possibly provide the wrong
answer.
“And because we evaluate models like multiple-choice exams, guessing always wins,” he said, providing no answer or an incorrect answer yields zero points, whereas a lucky
guess gets one point.
Models learn what humans teach them, and confidence beats caution, Jacobson wrote in a LinkedIn post.
“The math backs this up,” he wrote. “If a
chunk of facts appears only once in training data, hallucinations on those facts aren't a fluke. They're guaranteed. Fake citations aren’t accidents. They're a rational outcome of how we reward
performance.”
An honest model returns the answer “I’m not sure,” Jacobson said, as opposed to a reckless one that always swings from one answer to another based on the
words used to query.
He added that correcting confidence thresholds would fix the problem, and to treat "I don't know" as a neutral response, rather than a failed one. He also said incorrect
answers should be penalized.
“But that would reshuffle leaderboards,” he said, “and leaderboards drive funding, headlines, and billion-dollar valuations.”
Hallucinations will persist -- not because AI is evil, but because developers designed chatbots to reward guessing when they do not know the absolute truth, Jacobson said.
Jacobson does not
view this as an AI algorithm problem, but instead calls it a human algorithm problem.
“We'd rather get a wrong answer than no answer at all,” he wrote.