
Reddit on Wednesday sued artificial
intelligence company Perplexity and three other entities for allegedly wrongly obtaining copyrighted posts.
"Reddit, Inc. brings this action to stop the industrial-scale,
unlawful circumvention of data protections by a group of bad actors who will stop at nothing to get their hands on valuable copyrighted content on Reddit," the company alleges in a complaint filed in
U.S. District Court for the Southern District of New York.
The content at issue was posted to Reddit and displayed on Google's search results pages, according to the complaint. Three
defendants -- Oxylabs UAB, AWMProxy, and SerpApi -- allegedly scraped the posts from Google, and Perplexity allegedly obtained the material from at least one of those companies.
Reddit characterizes Oxylabs UAB, AWMProxy, and SerpApi as "data-scraping service providers who specialize in creating and selling tools designed to circumvent digital defenses and
scrape others’ content."
advertisement
advertisement
"These tools are aimed at bypassing two levels of security: First, evading Reddit’s own anti-scraping measures, and second, circumventing
Google’s controls and scraping Reddit content directly from Google’s search engine results," Reddit alleges.
Oxylabs UAB allegedly is a "Lithuanian data scraper,"
AWM Proxy allegedly is a former Russian botnet, and SerpApi allegedly is a Texas company that "publicly advertises its shady circumvention tactics."
"In a very real sense,
these defendants are similar to would-be bank robbers, who, knowing they cannot get into the bank vault, break into the armored truck carrying the cash instead," Reddit charges.
Reddit also claims that Perplexity is a "willing customer" of at least one of the three other defendants.
Reddit alleges that it confirmed its theory about
Perplexity by creating a "test post" that "could only be crawled by Google’s search engine and was not otherwise accessible anywhere on the internet."
"Within hours,
queries to Perplexity’s 'answer engine' produced the contents of that test post," Reddit alleges.
"The only way that Perplexity could have obtained that Reddit content and then used it
in its 'answer engine' is if it and/or its co-defendants scraped Google [search results] for that Reddit content and Perplexity then quickly incorporated that data into its answer engine," the
complaint says.
A Perplexity spokesperson said the company has not yet received the lawsuit, but "will always fight vigorously for users’ rights to freely and fairly
access public knowledge."
"Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and
the public interest," the spokesperson said in an email to MediaPost.
A SerApi representative reportedly said the company disagrees with the allegations and will
"vigorously defend" itself in court. Oxylabs also reportedly said
it will defend itself in court.
Reddit's complaint includes a claim that all defendants are violating the Digital Millennium Copyright Act anti-circumvention provisions, which
prohibit anyone from bypassing or impairing technological measures that control access to copyrighted material.
Reddit -- which is also suing Anthropic for allegedly scraping data -- says in the new
complaint that its "vast corpus of human-generated content is widely seen as invaluable to AI companies," adding that its content is "particularly well-suited" to training large language models
because its "data and information constantly grows and regenerates as users come and interact with their communities and each other."
The company adds that it "expressly
prohibits" outside companies from using its content for training artificial intelligence, unless those companies have an agreement with Reddit, and it "routinely blocks scrapers that attempt to access
and scrape data unless they agree to the privacy and data restrictions in Reddit’s policies."