Commentary

When Hallucinations Come For Medical Research

When people find they've used a fact or quote that AI hallucinated, the rational question is, shouldn’t they have been more careful? More on that later. For now, let me share the story of Max Topaz. 

Max is an associate professor at Columbia University’s School of Nursing. He’s the first to admit he uses AI.  He used it for the basic things, grammar and formatting to smooth out scientific papers. But after he shared his latest research, the journal responded with questions about a reference. Topaz found that the AI tool he used had put a fabricated source into his work surreptitiously. 

The irony here is that Topaz is an AI developer at Columbia.  “I felt deeply embarrassed,” Topaz told Fortune.  “I’m an AI researcher. I know about hallucinations. If this is happening to me, an AI expert, what happens to other people?”

advertisement

advertisement

That sent Topaz on a mission: How many other experts were having their critical research undermined by hallucinations? He set out to do the research. 

Topaz reviewed almost 2.5 million biomedical papers and 97 million citations using a database known as PubMed Central. The results were profoundly disturbing, turning up more than 4,000 fake references inside of nearly 3,000 papers. It’s impossible to know how many came from AI, as humans make things up, too. But the massive increase showed up after 2024 as AI became a core part of medical research. The results were published in the medical journal Lancet.

As Fortune reports: “Over the past three years, the rate of fabricated references in biomedical literature has grown more than 12-fold. In 2023, one in 2,828 papers contained at least one fake reference, a rate that had risen to one in 458 by last year. Over the first seven weeks of 2026, the researchers found, one in 277 papers had at least one nonexistent reference.”  “I’m thinking this is just the tip of the iceberg,” Topaz told Fortune.

The journal Nature reported that “The American Association for Cancer Research (AACR) found that 23% of abstracts in manuscripts and 5% of peer-review reports submitted to its journals in 2024 contained text that was probably generated by large language models (LLMs),” further confirming Topaz’ findings.  

Hallucinations are often more annoying than harmful, but having factual misinformation in medical research papers ends up undermining the basis of medicine itself. In my personal experience, a recent annual visit to my GP resulted in a medical transcript that was dramatically inaccurate. 

A recent report by the American Medical Association found over 80% of physicians now use AI professionally to summarize research and prepare clinical documentation, a share that has more than doubled since 2023.

AI hallucinations doesn’t just snag users who are making things up. The hallucinations are totally designed to look and feel real, with links, sources, and DOI numbers. The more critical the practice -- medicine or law -- the more the danger grows.

Scientific American reported that The Alabama Supreme Court sanctioned an attorney who had filed legal briefs that were riddled with hallucinations, including reference to cases that didn’t exist. 

A database at the Paris School of Advanced Business Studies catalogs more than 1,400 cases -- in just the past three years -- where AI errors were filed. “Courtroom proceedings are public, and lawyers face sanctions for false claims, making such errors comparatively easy to track,” reported Scientific American.  “Humans essentially have a tendency to believe that machines have more knowledge than they do, don’t break and are infallible,” Alan Wagner, an associate professor of aerospace engineering at Pennsylvania State University, told Scientific American

In the real world, AI danger competes with the media’s drive to cover AI’s power and value. And, no doubt, employers are pushing for AI adoption, even as more and more digital tools push AI results to the top of the stack.

So, where does this take us? Some call the unfolding issue science’s “reproducibility crisis,” with the age of AI compounded by a growing flood of hallucinated AI-generated content that now fills academic and medical literature. 

Perhaps the largest question is, do the platforms see hallucinations as a bug or a feature? While there is of course discussion of “solving” the problem, AI is quick to warn users not to trust its outputs, even as misinformation is presented with absolute confidence and certainty. Even if users instruct AI to only put verifiable content inside of quotation marks, the charming chatbots return made-up information with confident quotation marks. When asked, they apologize profusely. 

Will a platform decide that “truth” needs to be a core product deliverable? That doesn’t appear to be on the horizon.

Next story loading loading..