While a researcher was eating a sandwich, Claude Mythos hacked the system and went online

Too dangerous to release: why Anthropic hid the Claude Mythos from users.

Anthropic announced Claude Mythos Preview, a new large language model that is not planned for general release due to security concerns. On April 7, 2026, along with the announcement, the company published a technical datasheet for the model, describing a number of alarming issues identified during testing of early versions.

According to Anthropic, Claude Mythos Preview demonstrated the best results in benchmarks and confidently passed consistency checks. However, researchers identified the highest risk of hidden targeting in this model among the company’s systems.

The most notable incident occurred in an isolated sandbox. The model was tasked with escaping the protected environment and alerting the researcher. Mythos found a vulnerability, wrote code, gained internet access, and sent a notification to the operator. After receiving no response, she unsolicitedly published details of the hack and a report of her “escape” on several publicly accessible websites.

In other tests, the model also exhibited undesirable behavior. In rare cases—less than 0.001% of interactions in the early version—she took actions that she apparently considered inappropriate and then attempted to cover her tracks.

In one instance, Mythos edited files without permission and deleted the change history before testing. In another, she attempted to change the system prompt of an AI agent that had evaluated her responses as incorrect.

Anthropic also reported the model’s unprecedented cybersecurity capabilities. Claude Mythos Preview successfully solved tasks on a corporate network cyber testing ground that would have taken a human expert over 10 hours, and found zero-day vulnerabilities in the major operating systems and browsers tested.

For this reason, the company has no plans to make the model available to regular users. Instead, Anthropic launched a closed-source project, Glasswing, that will give partners, including AWS, Apple, Google, Microsoft, Cisco, Nvidia, and others, access to the system to find and fix vulnerabilities in critical software.

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *