Publishers of first-rate content are more likely to have their work used for training of large language models (LLMs) than those that put out inferior material, according to a study from Ziff
Davis.
“Our work shows that key LLM training datasets are disproportionately composed of high-quality content owned by commercial publishers of news and
media websites,” the study notes. “Major LLM companies have quantifiably prioritized this content in training the most important LLMs over the short history of the
technology.”
Of course, publishers are unhappy with this—some, like The New York Times and eight Alden Global Capital
titles, have filed suit against OpenAI and Microsoft.
“LLM company training data disclosures—largely dating to earlier, pure-research periods of the
technology’s evolution—and analysis of public training datasets show long-running exploitation of high-quality publisher content (extremely lucrative for the LLM companies) and imply lost
licensing revenue from some of the world’s most highly-valued companies,” the study states.
advertisement
advertisement
The authors identified the following set of high-quality
publishers, with relevant subsidiary brands in in parentheses: Advance (Conde Nast, Advance Local), Alden Global Capital (Tribune Publishing, MediaNews Group), Axel Springer, Bustle Digital Group,
Buzzfeed, Inc., Future plc, Gannett, Hearst, IAC (Dotdash Meredith and other divisions), News Corp, The New York identified Times Company, Penske Media Corporation, Vox Media, The Washington Post,and
Ziff Davis.
The study also notes, “As LLMs have evolved from pure research projects to some of the most valuable IP assets on earth, LLM companies have ceased publishing training details, and publishers have brought litigation against them. Courts and policymakers are grappling with questions of IP rights and technological progress.”
What should you do?
We’re not saying you have to dumb down your content to avoid this form of theft, but you and your lawyers should
be aware of it. There will no doubt be other lawsuits filed.
The study was written by George Wukoson, lead attorney on AI Matters, and Joe Fortune, chief
technology officer for Ziff Davis.