BereaOnline.com Name Logo, Blue Letters

🔍 When AI Companies Accuse Each Other of Data Theft, the Fight is Really About Who Gets to Learn From Whom

BEREA, Ky. — Anthropic says it has uncovered “industrial-scale” campaigns by three Chinese AI labs—DeepSeek, Moonshot AI, and MiniMax—that used about 24,000 fraudulent accounts to generate more than 16 million interactions with its Claude model. Anthropic states the goal was distillation: training another model on Claude’s outputs to copy capabilities faster and cheaper than building them from scratch.

Anthropic’s accusation matters because it is unusually specific. It names companies, quantifies volume, describes exactly how the traffic behaved, and frames the activity as a severe security issue, not just a business dispute. It also arrives squarely in the middle of a massive policy fight about AI chip exports and “who gets to have frontier capability,” which is why this story is much bigger than one vendor’s terms of service.


🕵️ What Anthropic Says Happened

Anthropic’s technical writeup says it identified coordinated campaigns that relied on fraudulent account creation to bypass regional restrictions and policy controls. The attackers then used high-volume prompting to extract Claude’s “most differentiated capabilities,” including reasoning, coding, tool use, and agentic behavior. Anthropic describes this as distillation abuse.

Reuters reports that Anthropic believes the massive output was used to aggressively improve competitors’ models. Crucially, Anthropic is using this incident to push for stronger U.S. export controls on advanced chips, linking that argument directly to distillation and model theft.


🔄 Distillation is Normal, and Also a Flashpoint

Distillation itself is not inherently shady. Anthropic explicitly acknowledges that distillation is “widely used and legitimate,” and that frontier labs routinely distill their own models to create smaller, cheaper versions.

The controversy is distilling someone else’s model without permission, especially at scale, and especially while bypassing geographic access controls. That is the line the industry is now fighting over. Training on the public internet created one set of copyright and permission disputes. Training on a competitor’s live model outputs creates a different kind of dispute, one that is closer to “scraping a service” than “learning from a corpus.”

This is why the sheer numbers matter. A handful of prompts looks like normal usage. Millions of prompts routed via thousands of proxy accounts looks like an extraction program.


🇨🇳 The China Angle is Real, but It Is Not the Whole Story

Anthropic frames the activity partly as a geopolitical problem. One reason is capability proliferation. Anthropic warns that models built via illicit distillation are unlikely to retain safety guardrails, meaning dangerous capabilities could spread without the protections U.S. labs try to enforce.

That is a safety argument, but it is also a policy argument because it supports tighter export restrictions and stronger government enforcement.

However, there is a hard business reality underneath. Distillation is a shortcut around the single most expensive part of the AI race. If a lab can cheaply “learn” the behavior of a frontier model, it can rapidly narrow the performance gap without spending the same multi-billion-dollar training budget. That is exactly why OpenAI has made similar accusations about DeepSeek in recent weeks.

(Note: As of current reporting, the accused firms have not publicly responded to Anthropic’s claims, meaning we are largely seeing one side’s evidence and framing. Anthropic has provided dense technical detail, but it remains an allegation, not a court finding.)


🔮 What This Means for the Future of AI

This episode is a clear preview of the next phase of global AI competition:

  • Output Theft as Abuse: Model providers are going to treat “output theft” like a severe cyberattack, similar to credential stuffing or card testing. Anthropic has already expanded behavioral fingerprinting to detect these patterns.
  • Litigating the Border: The border between “open research technique” and “prohibited extraction” will get heavily litigated and regulated. Terms of service are not a global enforcement mechanism, and governments will increasingly interpret these incidents as illicit capability transfers.
  • The Push for Provenance: This will intensify pressure for data provenance and auditing. If a model’s training includes synthetic data derived from competitors’ outputs, companies will face massive reputational and legal risk.
  • A Messy Ethics Landscape: It is worth saying out loud that the industry’s ethics landscape is messy on all sides. U.S. labs are still fighting lawsuits and criticism over how their original training data was gathered.

When AI companies accuse one another of “theft,” readers should understand it as both a valid security claim and a high-stakes power struggle over the new rules of machine learning.


🔗 Where to Read More


🖊️ About the Author

Chad Hembree is a certified network engineer with 30 years of experience in IT and networking. He hosted the nationally syndicated radio show Tech Talk with Chad Hembree throughout the 1990s and into the early 2000s, and previously served as CEO of DataStar. Today, he is based in Berea as the Executive Director of The Spotlight Playhouse, proof that some careers don’t pivot, they evolve.

BereaOnline.com: Covering Berea, KY News and Events Since 1995