Commentary

AI Scavengers: GenAI Systems Are Run On the Backs Of Creatives, Group Says

News and magazine content is being ripped off and used to train the Large Language Models (LLMs) that fuel GenAI (GAI) systems, according to a white paper from the News/Media Alliance. 

“GAI systems, while holding promise for consumers, businesses, and society at large, are commercial products that have been built — and are run — on the backs of creative contributors,” the Alliance states. 

The Alliance has sent a letter to the U.S. Copyright Office in response to a notice of inquiry, calling for government action to protect the rights of publishers and writers.  

In its study, the Alliance found that "popular curated datasets underlying LLMs significantly overweight publisher content by a factor ranging from over 5 to almost 100 as compared to the generic collection of content that the well-known entity Common Crawl has scraped from the web.”

advertisement

advertisement

Moreover, it alleges that “LLMs can reproduce the content on which they were trained, demonstrating that the models retain and can memorize the expressive content of the training works.”

GAI developers argue that GAI models are simply “learning’ unprotectable facts from copyrighted training materials, but the Alliance counters that this “anthropomorphic claim is technically inaccurate and beside the point.”

Indeed, the Alliance contends, LLMs “typically ingest (i.e., copy) valuable news, magazine, and digital media web content for their written expression, so that they can mimic that very form of expression.” 

The Alliance offers multiple recommendations for policymakers, such as: adopt regulations to enable publishers to group register online web content in an efficient, economical, and simple manner; that Congress consider the passage of the Journalism Competition and Preservation Act (JCPA); and facilitating stakeholder dialogues in order to develop voluntary guidance documents, policy recommendations, and toolkits — similar to the NTIA’s work as part of the Biden-Harris Administration’s Task Force on Kids Online Health & Safety. 

“The research and analysis we've conducted shows that AI companies and developers are not only engaging in unauthorized copying of our members' content to train their products, but they are using it pervasively and to a greater extent than other sources,” says Danielle Coffey president & CEO of the Alliance. 

Coffey adds, “This shows they recognize our unique value, and yet most of these developers are not obtaining proper permissions through licensing agreements or compensating publishers for the use of this content.”

Next story loading loading..