The core of the dispute
The social networking platform Reddit Inc. has filed a complaint in federal court in New York against Perplexity AI, Inc. and three other entities—Oxylabs UAB (Lithuania), AWMProxy (described as a former Russian botnet) and SerpApi LLC (Texas). The lawsuit alleges that these parties systematically harvested Reddit user posts to fuel Perplexity’s “answer engine,” bypassing the platform’s protections and licensing controls.
Reddit argues its vast repository of public conversations—spread across more than 100,000 communities—has become a cornerstone for AI developers seeking rich, human-generated text. But while Reddit has secured licensing deals with major players like OpenAI and Google LLC for lawful access to its data, it claims Perplexity chose a different route: acquiring scraped content from third-party firms without entering into a formal agreement.
Key allegations and figures
- Reddit says it discovered a “forty-fold” increase in citations of its content by Perplexity after issuing a cease-and-desist in May 2024.
- In the filing, Reddit describes the data-gathering as “industrial-scale,” pointing to automated bots that masked their identities, changed locations and mimicked regular browser traffic to evade detection.
- The complaint quotes Reddit’s General Counsel stating: “AI companies are locked in an arms race for quality human content—and that pressure has fueled an industrial-scale ‘data-laundering’ economy.”
- The suit seeks unspecified compensatory and punitive damages, and a permanent injunction to stop Perplexity (and the other defendants) from continuing to use or sell Reddit-derived content.
- Publication reports estimate that Reddit spent “tens of millions of dollars” strengthening its anti-scraping infrastructure in recent years, highlighting the value it places on protecting its users’ data.
Licensing vs. Scraping — a strategic contrast
Reddit’s business model has shifted to monetizing its content through licensing: its deals with Google and OpenAI reportedly make up nearly 10% of its revenue. Under these agreements, Reddit retains control over how its data is used, preserving user privacy rights and ensuring deleted posts are not repurposed. In contrast, Reddit accuses Perplexity of sidestepping that process, acquiring scraped posts rather than negotiating a license.
This lawsuit stirs questions about the broader AI-ecosystem: how far can companies probe freely-accessible online data, and when does access become exploitation? Reddit aims to draw a clear boundary: generating profit from user-generated discussions without compensation to the platform and its community is unacceptable.
Wider context: why this case matters
- The AI industry’s hunger for data has never been greater: vast troves of conversational text help train chatbots, search agents and other large-language-model applications.
- Platforms that host user-generated content — such as Reddit, blogs, comment forums — have become battlegrounds for rights and revenues.
- This lawsuit follows an earlier one in June 2025, where Reddit sued another AI firm, Anthropic PBC, accusing it of unauthorized scraping and refusal to license data.
- The outcome could set precedent: how U.S. courts treat scraped versus licensed content, and whether platforms can enforce stronger control over how their user-generated data is used commercially.
Response from the parties
Perplexity, via a public statement, called the lawsuit a “show of force” by Reddit in its negotiations with other AI firms. It says that it does not train its models on Reddit data but rather summarizes publicly available discussions, making licensing irrelevant to its operations. SerpApi also issued a statement, vowing to vigorously defend itself and stating that Reddit’s claims are “strongly disagreed with.” Oxylabs characterized Reddit’s action as an attempt to monopolize public data and raise its price.
What’s next and what to watch
- The case is proceeding in the U.S. District Court for the Southern District of New York (Case No. 25-cv-8736).
- Key issues will include: proof of how data was obtained, whether it was used to train AI models (and whether licensing would have been required), and the valuation of the alleged misuse.
- If Reddit succeeds, we may see a wave of litigation as platforms seek to assert data-rights over content used by generative-AI firms.
- For AI companies, the case highlights the urgent need to audit training data sources, establish clear licensing arrangements, and anticipate regulatory or contractual constraints on data harvesting.