Reddit Sues Perplexity, Others Over Copyrighted Posts

Reddit on Wednesday sued artificial intelligence company Perplexity and three other entities for allegedly wrongly obtaining copyrighted posts.

"Reddit, Inc. brings this action to stop the industrial-scale, unlawful circumvention of data protections by a group of bad actors who will stop at nothing to get their hands on valuable copyrighted content on Reddit," the company alleges in a complaint filed in U.S. District Court for the Southern District of New York.

The content at issue was posted to Reddit and displayed on Google's search results pages, according to the complaint. Three defendants -- Oxylabs UAB, AWMProxy, and SerpApi -- allegedly scraped the posts from Google, and Perplexity allegedly obtained the material from at least one of those companies.

Reddit characterizes Oxylabs UAB, AWMProxy, and SerpApi as "data-scraping service providers who specialize in creating and selling tools designed to circumvent digital defenses and scrape others’ content."

advertisement

advertisement

"These tools are aimed at bypassing two levels of security: First, evading Reddit’s own anti-scraping measures, and second, circumventing Google’s controls and scraping Reddit content directly from Google’s search engine results," Reddit alleges.

Oxylabs UAB allegedly is a "Lithuanian data scraper," AWM Proxy allegedly is a former Russian botnet, and SerpApi allegedly is a Texas company that "publicly advertises its shady circumvention tactics."

"In a very real sense, these defendants are similar to would-be bank robbers, who, knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead," Reddit charges. 

Reddit also claims that Perplexity is a "willing customer" of at least one of the three other defendants.

Reddit alleges that it confirmed its theory about Perplexity by creating a "test post" that "could only be crawled by Google’s search engine and was not otherwise accessible anywhere on the internet."

"Within hours, queries to Perplexity’s 'answer engine' produced the contents of that test post," Reddit alleges.

"The only way that Perplexity could have obtained that Reddit content and then used it in its 'answer engine' is if it and/or its co-defendants scraped Google [search results] for that Reddit content and Perplexity then quickly incorporated that data into its answer engine," the complaint says.

A Perplexity spokesperson said the company has not yet received the lawsuit, but "will always fight vigorously for users’ rights to freely and fairly access public knowledge."

"Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest," the spokesperson said in an email to MediaPost.

A SerApi representative reportedly said the company disagrees with the allegations and will "vigorously defend" itself in court. Oxylabs also reportedly said it will defend itself in court.

Reddit's complaint includes a claim that all defendants are violating the Digital Millennium Copyright Act anti-circumvention provisions, which prohibit anyone from bypassing or impairing technological measures that control access to copyrighted material.

Reddit -- which is also suing Anthropic for allegedly scraping data -- says in the new complaint that its "vast corpus of human-generated content is widely seen as invaluable to AI companies," adding that its content is "particularly well-suited" to training large language models because its "data and information constantly grows and regenerates as users come and interact with their communities and each other."

The company adds that it "expressly prohibits" outside companies from using its content for training artificial intelligence, unless those companies have an agreement with Reddit, and it "routinely blocks scrapers that attempt to access and scrape data unless they agree to the privacy and data restrictions in Reddit’s policies."

Next story loading loading..