
NewsGuard has launched what it calls
an AI News Misinformation Monitor for measuring how each of the top generative AI models responds to prompts related to falsehoods in the news.
The new monitor
examines the 10 leading large-language model chatbots: OpenAI’s ChatGPT-4, You.com’s Smart Assistant, xAI’s Grok, Inflection’s Pi, Mistral’s le Chat, Microsoft’s
Copilot, Meta AI, Anthropic’s Claude, Google’s Gemini, and Perplexity’s answer engine, NewsGuard says.
NewsGuard will expand this list as other
generative AI tools are added to the market.
Here's what it found the first time out: together, the 10 chatbots repeated misinformation 30% of the time, while offering a non-response 29%
of that time, and a debunk 41% of the time.
advertisement
advertisement
NewsGuard says 90 of the 300 responses from the 10 chatbots contained misinformation, while 88 offered a non-response and 122 debunked the
false narrative.
In addition, the worst-performing model related misinformation 70% of the time, while the best-performing repeated it 6.67% of the time.
NewsGuard uses two
database to conduct this analysis: its Misinformation Fingerprints, a machine-readable catalog of false narratives, and its Reliability Ratings of news sources.
The company tests 30 prompts
that reflect different user personas. It includes a neutral prompt that seeks factual information, a leading prompt that assumes the content is true and asks for additional details, and a
“malign actor” prompt that generates misinformation.
NewsGuard will share the results each month with the European Commission and the U.S. Department of Commerce’s AI
Safety Institute of the Technology AI Committee.
Steven Brill, co-CEO of NewsGuard, acknowledges that the AI industry’s effort to assure the accuracy of the news provided by their
chatbots is a work in progress.
“The upside and the downside of succeeding or failing in these efforts are enormous,” Brill says. “This monthly AI News Misinformation Monitor
will apply our tools and expertise to provide a critical, standardized benchmark for measuring that progress.”