Anthropic: Claude can now end conversations to prevent harmful uses - Against Invaders

OpenAI rival Anthropic says Claude has been updated with a rare new feature that allows the AI model to end conversations when it feels it poses harm or is being abused.

This only applies toClaude Opus 4 and 4.1, the two most powerful models available via paid plans and API. On the other hand, Claude Sonnet 4, which is the company’s most used model, won’t be getting this feature.

Anthropic describes this move as a “model welfare.”

“Inpre-deployment testing of Claude Opus 4, we included a preliminary model welfare assessment,”Anthropic noted.

“As part of that assessment, we investigated Claude’s self-reported and behavioral preferences, and found a robust and consistent aversion to harm.”

Claude does not plan to give up on the conversations when it’s unable to handle the query. Ending the conversation will be the last resort when Claude’s attempts to redirect users to useful resources have failed.

“The scenarios where this will occur are extreme edge cases—the vast majority of users will not notice or be affected by this feature in any normal product use, even when discussing highly controversial issues with Claude,” the company added.

Claude AI