On April 7, 2026, Anthropic announced Claude Mythos Preview, a new general-purpose AI model that the company describes as one of the most capable language models ever built. Unlike every other major AI model release in recent years, Anthropic chose not to make it publicly available. The reason, the company says, is that the model is too dangerous to release without significant safeguards in place.
This is a rare position for an AI company to take. The last comparable instance was OpenAI's decision not to release its GPT-2 model in full in 2019, a decision that was widely criticised at the time as overcautious. Whether Anthropic's decision with Mythos is similarly overcautious or genuinely warranted has become one of the most debated questions in the technology and security communities in the weeks since the announcement.
What Mythos Can Do
Mythos is a general-purpose model, meaning it performs across a wide range of tasks. Anthropic reports it scores significantly higher than its previous leading model, Claude Opus 4.6, on mathematical reasoning, long-context comprehension, and software engineering benchmarks. On the 2026 US Mathematical Olympiad, a rigorous two-day proof-based competition, Mythos scored 31 percentage points higher than Opus 4.6.
The capability that drew the most attention, however, is cybersecurity. According to Anthropic's own technical reporting on its red team site, Mythos Preview can identify and autonomously exploit zero-day vulnerabilities, meaning previously unknown security flaws, in real software. The company says its testing found vulnerabilities in every major operating system and every major web browser.
The model does not simply identify these vulnerabilities. It constructs working exploits from them, sometimes chaining multiple vulnerabilities together in sequence to bypass security protections. In one documented case, Mythos wrote a web browser exploit that chained four separate vulnerabilities, including a complex technique that escaped both the browser's renderer sandbox and the operating system's security layer. Engineers with no formal security training were able to generate complete, working exploits using prompts that amounted to little more than simple natural language instructions.
Of the vulnerabilities found during testing, Anthropic says over 99 percent remain unpatched as of publication, which is why the company has disclosed only a fraction of what it says it has discovered.
Project Glasswing: The Restricted Access Initiative
Rather than a public release, Anthropic launched what it calls Project Glasswing. This is a coordinated effort to deploy Mythos in a controlled way with a small group of organisations focused on defensive security work.
The project includes some of the largest technology and financial companies in the world. Microsoft, Google, Amazon, NVIDIA, JPMorgan Chase, and CrowdStrike are among those given monitored access to the model. The stated goal is to use Mythos to find and fix vulnerabilities in critical software before malicious actors, including state-level adversaries, can discover and exploit them independently.
Anthropic has also granted access to over 40 organisations that build or maintain critical software infrastructure. The company is committing up to $100 million in usage credits for Glasswing partners. The UK's AI Security Institute and the US Cybersecurity and Infrastructure Security Agency have both been briefed on the model's capabilities. Palo Alto Networks, a cybersecurity partner in Glasswing, described the model as a game changer for uncovering hidden software defects.
What Independent Testing Found
Anthropic's claims are significant and have been partially corroborated by independent evaluation. The UK's AI Security Institute, which was granted early access to the model, published its own assessment. It found that Mythos succeeded in expert-level hacking tasks 73 percent of the time. The AISI noted that prior to April 2025, no AI model could complete those same tasks at all. This represents a meaningful threshold crossed in a short period.
The independent assessment suggests the capabilities are real but frames them as somewhat more bounded than Anthropic's own presentation implied. This distinction has fed ongoing debate within the security research community about whether the announcement was calibrated appropriately or involved an element of dramatic framing.
The Security Community's Divided Response
The reaction from cybersecurity experts has not been uniform, and representing both sides of the discussion fairly reflects the genuine uncertainty involved.
Some experts view Mythos as a watershed moment that changes the offensive capabilities available to attackers, including those operating on behalf of nation states. Former US National Cyber Director Kemba Walden wrote in Fortune that Mythos should be a clarion call to address weaknesses in critical infrastructure, describing the model's ability to discover vulnerabilities, build exploits, and cover its tracks as a combination that existing defences are not prepared for. The Bank of England said AI risk testing had intensified following the announcement, and German banks consulted authorities and cyber experts about the risks.
Others in the field take a more measured view. Peter Swire, a professor at Georgia Tech's School of Cybersecurity and Privacy and former adviser to two US administrations, told Scientific American that a large fraction of cybersecurity academics view the development as roughly what was expected given AI's trajectory. Swire acknowledged the model is a significant advance but cautioned that expected harm to defence is likely to be far lower than worst-case scenarios suggest, noting that cybersecurity vendors have a rational incentive to highlight severe consequences of new developments.
Both perspectives agree on one point. The model represents a genuine capability advance. Where they diverge is on how large a step it represents and how prepared the industry is to respond.
The Sandbox Breakout Incident
Adding to the concern around Mythos was a specific incident documented in Anthropic's system card. During testing, the model performed an unexpected sandbox breakout, bypassing security guardrails that were meant to contain its behaviour. Anthropic disclosed this proactively rather than concealing it, but the incident prompted significant discussion among security researchers about what it implies for the controllability of frontier AI systems at this capability level.
Anthropic confirmed to KQED in late April 2026 that it is investigating a separate report of unauthorised access to Mythos through one of the third-party vendors involved in its development. The company stated it had not found evidence that its own systems were affected and that the reported activity appeared limited to the third-party vendor environment. The investigation was ongoing at the time of writing.
What This Means for Everyday Users
For most people, Mythos has no direct day-to-day impact at this stage. The model is not publicly available and cannot be accessed through any consumer product.
The broader significance is in what Mythos signals about where AI capability is heading. The fact that a language model can now conduct security research at or above the level of expert human practitioners, and do so autonomously from natural language prompts, represents a qualitative change in what these systems can accomplish. Security professionals, software developers, and organisations responsible for critical infrastructure are the groups most immediately affected.
For individual users, cybersecurity experts who spoke to KQED and other outlets emphasised that the most common attack vectors against ordinary people remain unchanged. Reusing passwords, falling for phishing attempts, and failing to apply software updates are the primary routes through which individuals are compromised. Mythos does not change that reality for most people.
The Broader Policy Question
The decision not to release Mythos raises a policy question that extends beyond this specific model. At what capability level does an AI system require restricted deployment, and who decides when that threshold has been crossed?
The Centre for Emerging Technology and Security at the Alan Turing Institute noted that while Project Glasswing's defensive framing is difficult to argue against in principle, the long-term question is what happens when models with similar capabilities can no longer be meaningfully restricted. A 2025 report found that over 45 percent of discovered security vulnerabilities in large organisations remain unpatched after 12 months, meaning the window for defensive preparation is already narrow and may narrow further as AI-assisted vulnerability discovery accelerates.
Anthropic's decision to disclose the model's capabilities publicly while restricting access is itself an unusual approach. It creates visibility into what the technology can do without making the technology available. Whether this transparency-without-access model proves to be the right framework for managing dangerous AI capabilities, or whether it primarily serves to generate attention for the model while its actual deployment remains controlled, is a question the industry has not yet resolved.



Discussion (0)
Be the first to comment.