Anthropic Reverses Controversial AI Model Safeguards After Research Community Backlash

Anthropic is backtracking on a policy that would have covertly limited competitors from using its new AI model, Claude Fable 5, to develop other AI models. The company changed course after the move received significant backlash from the AI research community.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic said in a statement to WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

Anthropic released Claude Fable 5, a version of its latest AI model with additional safety guardrails designed to prevent misuse, earlier this week. Some of the safeguards Anthropic outlined were straightforward: The company said it would reroute users who asked questions about cybersecurity, biology, or chemistry to a less capable AI model to reduce the chances of someone using the advanced AI to carry out a cyberattack or build a bioweapon.

However, for researchers attempting to use Claude Fable 5 for frontier AI development, Anthropic had outlined a different approach. The firm would deliberately degrade the model’s performance in ways that were invisible to the user. This would effectively sabotage researchers trying to use Claude to train competing AI models, which Anthropic explicitly bans in its terms of service.

Anthropic has now reversed the policy, stating that Claude Fable 5’s safeguards for AI development will be visible to users. If the company suspects a user is attempting to use Claude to build a highly capable AI, it will alert them that it’s either refusing the request or rerouting them to a less capable model.

The company reversed the policy following fierce backlash from the AI research community. While Anthropic has already taken steps to limit competitors from using Claude to build closed and open source AI models, critics argue that secretly degrading the model’s performance for certain users went too far. Claude’s coding agent has become a popular tool among developers, including those working on open-source AI research projects, and researchers tell WIRED the policy could have led to a troubling future where only a handful of leading AI labs could perform advanced AI research.

Dean Ball, a senior fellow at the Foundation for American Innovation and a former advisor to the White House on AI, wrote in a post on X that “degrading performance on ML research *without telling the user* is shockingly hostile and a terrible look.” He continued that the “secret sabotage” policy undermines Anthropic’s overall stance, because it limits AI researchers from collaborating on AI safety.

“It felt like Anthropic was saying to the public, ‘We don’t trust anybody else to do AI research. We are the only ones who have to do AI research,'” says Will Brown, research lead at the open source AI startup Prime Intellect. “It feels a bit like they’re starting to pull the ladder up behind them.”

Brown noted that the policy would have left developers unaware of whether they were violating Anthropic’s rules, since the company wouldn’t alert them when its safeguards were triggered. He added that the restrictions could have had widespread consequences, pointing to the growing ecosystem of third-party evaluation firms that test frontier models for safety, performance, and reliability—work that could have been hindered if Anthropic secretly degraded its model.

Also Read

Source link

What's Hot

One dead and dozens injured in train collision north of London

Australia Confirms First Cases of H5N1 Bird Flu; Virus Spread Reaches New Frontiers

KhyberPakhtunkhwa Budget FY27 Focuses on Financial Stability and Development

Anthropic Reverses Controversial AI Model Safeguards After Research Community Backlash

OpenAI Charts Course Toward AI for All, Pledges to Build Technology That Benefits Everyone]

From VLC to Robots: Jean‑Baptiste Kempf’s New Real‑Time Control Platform for AI‑Powered Machines

Post-IPO Strategy: How Go Aims to Revolutionize Japan’s Taxi Industry Through Robotaxis and Strategic Acquisitions

NASA Partners with Relativity Space for 2028 Mars Mission

Bison Herd Fends Off Wolf Attack on Newborn Calf in Poland’s Białowieża Forest

Rocket Report: Rebuild begins at Blue Origin launch pad; Relativity targets Mars

OpenAI Charts Course Toward AI for All, Pledges to Build Technology That Benefits Everyone]

From VLC to Robots: Jean‑Baptiste Kempf’s New Real‑Time Control Platform for AI‑Powered Machines

Post-IPO Strategy: How Go Aims to Revolutionize Japan’s Taxi Industry Through Robotaxis and Strategic Acquisitions

NASA Partners with Relativity Space for 2028 Mars Mission

What's Hot

Anthropic Reverses Controversial AI Model Safeguards After Research Community Backlash

Also Read

Keep Reading

Subscribe to Updates