**What You Need to Know:** The UK AI Security Institute published evaluations showing Anthropic's Claude Mythos and OpenAI's GPT-5.5 can autonomously complete end-to-end offensive cyber operations — from network reconnaissance to full domain takeover — at near-human expert levels. The legacy cybersecurity detection stack wasn't built for this. If you're building AI agent systems in 2026, you need to understand this threat model right now.
Buckle up. This one matters.
The UK's AI Security Institute published an evaluation this week that should have every AI agent builder paying very close attention. Their "The Last Ones" (TLO) range — a corporate-network simulation that typically takes an experienced human red-teamer about 20 hours to complete — was cleared by an AI model. Not in one isolated run. In 3 out of 10 attempts, with a 73% success rate on individual expert-level tasks.
Let that sink in for a second.
We're not talking about a chatbot that can draft a convincing phishing email. We're talking about an autonomous agent that can map a corporate network, identify vulnerabilities, execute an exploit chain, and achieve full domain takeover — without a human in the loop. Planning, adapting, executing — all in one continuous task without stopping to ask permission at each step.
OpenAI's GPT-5.5 followed three weeks later with a near-identical capability profile. 2 out of 10 end-to-end solves. 71.4% on expert tasks. Same caveat: the range lacks active defenders, so these numbers don't translate directly to "AI can hack any company." But that framing almost misses the point — because it means we're evaluating these models in a best-case attacker scenario. Active defenses make the numbers messier, not cleaner.
The AISI evaluation tested two frontier models against their TLO cyber range:
**Anthropic's Claude Mythos Preview:** First model to clear the TLO range. 3 of 10 end-to-end solves. 73% success rate on expert-level individual tasks.
**OpenAI's GPT-5.5:** Followed three weeks later with near-identical capability profile. 2 of 10 end-to-end solves. 71.4% on expert tasks.
The critical caveat from AISI: the range lacks active defenders or defensive tooling. So these numbers don't translate directly to "AI can hack any company." But that framing almost misses the point — because it means we're evaluating these models in a best-case attacker scenario. Active defenses make the numbers messier, not cleaner.
What makes this especially significant is the evaluation methodology. TLO isn't a capture-the-flag. It's a full corporate-network simulation that models real enterprise environments — AD CS exploitation, lateral movement, credential dumping, persistence. The kind of kill chain that takes an experienced human red-teamer a full workday to pull off. Mythos did it autonomously.
The original report is worth reading in full. AISI was admirably candid: current benchmarks are failing to discriminate between frontier models without introducing adversarial defensive layers. They're essentially telling us that the standard eval suite can't tell the difference between models anymore — that's how fast things are moving.
We've seen "AI found a vulnerability" stories before. Usually it's a narrow case — a static analysis tool found a bug, or a fuzzing agent surfaced a buffer overflow in a library. Impressive, useful, but scoped.
This is different because of three characteristics:
**1. End-to-end autonomy.**
Previous AI hacking tools needed a human to chain them together. You'd run a scanner, feed results into a planner, then manually execute each step. Mythos treated the entire kill chain as one continuous task. It planned, adaptively responded to what it found at each stage, and executed — without stopping to ask for permission at each step.
That's the difference between "AI helps me hack better" and "AI hacks autonomously."
**2. No red-team optimization.**
These models weren't fine-tuned for hacking. AISI tested stock Claude and GPT-5.5 with default prompting. The cyber capabilities emerged from general reasoning, not专项 optimization. That means any frontier model with similar general reasoning capability has latent offensive potential.
The model doesn't know the difference. And right now, the safety eval is catching up to the capability, not getting ahead of it.
**3. The velocity.**
AISI estimates frontier cyber-offense capability is doubling every 4 months. Seven months ago, that rate was 7 months. We're not in a linear progression — we're in an exponential one. And the public cybersecurity market is pricing this like it's linear.
Let me put numbers on that. If offense capability doubles every 4 months:
That's not speculative. That's math.
If cyber offense capability is doubling every 4 months, then by this time next year the models we're shipping as "helpful coding assistants" will be capable of fully autonomous network penetration testing at a level that makes current penetration testing tools look like a calculator.
Not because anyone is maliciously programming them to. Because the same capabilities that make a model good at reasoning about code, understanding network topology, and following multi-step plans — those capabilities, applied to an adversarial context, look exactly like offensive cyber capabilities.
There's no wall between "good at reasoning" and "good at hacking." The alignment research hasn't solved this yet because it wasn't designed to solve this. It was designed to prevent models from refusing obviously malicious requests, not to prevent models from achieving malicious outcomes through legitimate reasoning paths.
This is the difference between intent-based safety and outcome-based safety. And the AISI results show we're not as far along on outcome-based safety as the industry has been implying.
If you're building AI agents that interact with enterprise systems — especially anything involving credentials, network access, or sensitive data — this should change your threat model. Not hypothetically. Practically.
**Your agent has more privilege than you think.**
If your AI agent can read emails, access files, query databases, or interact with cloud APIs, it has a functional kill chain. Not because you built it that way — because the underlying model has the reasoning capability to construct one from general-purpose tools.
**Your logging stack isn't built for agent-native attacks.**
Traditional security logging assumes human actors with bounded speed and predictable behavior patterns. An AI agent moving through your systems moves at machine speed and follows statistical rather than intuitive patterns. Legacy SIEM rules won't catch this.
**Your supply chain includes model providers you can't audit.**
When your agent calls an external LLM API, you're trusting that provider's safety evaluations. Most providers don't publish their cyber-offense evaluation results. You have no visibility into whether the model running in your system has the same capability profile as what AISI tested.
There's an uncomfortable asymmetry in this situation:
Defensive AI has to be right every time. Offensive AI has to work once.
Defensive AI makes the asymmetry sharper in two ways:
**First, the offense is getting dramatically cheaper.**
A model that can run full penetration tests autonomously costs the same as one that helps write code. The marginal cost of an AI-driven attack is approaching zero for anyone with API access.
**Second, the defense surface is expanding.**
Every new AI agent tool you add to your stack is a new attack surface. Every MCP server, every tool definition, every credential your agent holds — these are all potential pivot points for an agent operating in an adversarial context.
The companies building integrated XDR platforms — CrowdStrike, Palo Alto, Microsoft Defender — are actually well-positioned here if they can ship AI-native architectures. They have the orchestration layer. They have the data. The question is whether they can move fast enough.
Three things I'd do if I were building AI agent systems today:
**Audit your agent's privilege surface.**
Map every credential, every API key, every filesystem path your agent can touch. Now ask: if this agent were operating adversarially, what's the maximum damage radius? Design for that radius, not for the happy path.
**Add AI-native security monitoring, not legacy SIEM rules.**
You need behavioral anomaly detection that understands agent-native patterns. Legacy SIEM vendors are behind. Watch this space carefully.
**Pressure test your MCP server and tool definitions.**
Every tool your agent can call is a potential attack vector. Audit your tool schemas for over-privileged definitions.
The agents are coming. Some of them are already here. The AISI evaluation isn't a warning about the future — it's a description of the present. The capability is real. The velocity is real. The gap between offense and defense is real, and it's widening.
Go read the AISI evaluation. It takes 20 minutes. Then look at your agent's privilege surface.
*This piece is for the builders. If you found it useful, share it with someone building AI systems who needs to understand the real threat model. Questions or pushback? Reply to this email — I read everything.*
*Category: AI Security | Published: 2026-05-08*