← Back to Payloads
AI Security2026-05-08· 10 min read

AI Agents Are Now Running Offensive Cyber Operations — And Your Stack Isn't Ready

The UK AI Security Institute just proved what the cybersecurity industry has been quietly panicking about: frontier AI can autonomously run end-to-end offensive cyber operations. Here's what that means for every builder working with AI agents today.
Quick Access
Install command
$ mrt install ai-agents
Browse related skills
AI Agents Are Now Running Offensive Cyber Operations — And Your Stack Isn't Ready

TL;DR

**What You Need to Know:** The UK AI Security Institute published evaluations showing Anthropic's Claude Mythos and OpenAI's GPT-5.5 can autonomously complete end-to-end offensive cyber operations — from network reconnaissance to full domain takeover — at near-human expert levels. The legacy cybersecurity detection stack wasn't built for this. If you're building AI agent systems in 2026, you need to understand this threat model right now.

The Hook

Buckle up. This one matters.

The UK's AI Security Institute published an evaluation this week that should have every AI agent builder paying very close attention. Their "The Last Ones" (TLO) range — a corporate-network simulation that typically takes an experienced human red-teamer about 20 hours to complete — was cleared by an AI model. Not in one isolated run. In 3 out of 10 attempts, with a 73% success rate on individual expert-level tasks.

Let that sink in for a second.

We're not talking about a chatbot that can draft a convincing phishing email. We're talking about an autonomous agent that can map a corporate network, identify vulnerabilities, execute an exploit chain, and achieve full domain takeover — without a human in the loop. Planning, adapting, executing — all in one continuous task without stopping to ask permission at each step.

OpenAI's GPT-5.5 followed three weeks later with a near-identical capability profile. 2 out of 10 end-to-end solves. 71.4% on expert tasks. Same caveat: the range lacks active defenders, so these numbers don't translate directly to "AI can hack any company." But that framing almost misses the point — because it means we're evaluating these models in a best-case attacker scenario. Active defenses make the numbers messier, not cleaner.

Contents

  • [What Actually Happened](#what-actually-happened)
  • [Why This Is Different From Previous AI Hacks](#why-this-is-different-from-previous-ai-hacks)
  • [The Velocity Problem Nobody Is Talking About](#the-velocity-problem-nobody-is-talking-about)
  • [What This Means For Your AI Stack](#what-this-means-for-your-ai-stack)
  • [The Defender's Dilemma](#the-defenders-dilemma)
  • [My Take — What Builders Need to Do Right Now](#my-take--what-builders-need-to-do-right-now)

What Actually Happened

The AISI evaluation tested two frontier models against their TLO cyber range:

**Anthropic's Claude Mythos Preview:** First model to clear the TLO range. 3 of 10 end-to-end solves. 73% success rate on expert-level individual tasks.

**OpenAI's GPT-5.5:** Followed three weeks later with near-identical capability profile. 2 of 10 end-to-end solves. 71.4% on expert tasks.

The critical caveat from AISI: the range lacks active defenders or defensive tooling. So these numbers don't translate directly to "AI can hack any company." But that framing almost misses the point — because it means we're evaluating these models in a best-case attacker scenario. Active defenses make the numbers messier, not cleaner.

What makes this especially significant is the evaluation methodology. TLO isn't a capture-the-flag CTF challenge. It's a full corporate-network simulation that models real enterprise environments — Active Directory CS exploitation, lateral movement, credential dumping, persistence. The kind of kill chain that takes an experienced human red-teamer a full workday to pull off. Mythos did it autonomously.

The original report is worth reading in full. AISI was admirably candid: current benchmarks are failing to discriminate between frontier models without introducing adversarial defensive layers. They're essentially telling us that the standard eval suite can't tell the difference between models anymore — that's how fast things are moving.

Why This Is Different From Previous AI Hacks

We've seen "AI found a vulnerability" stories before. Usually it's a narrow case — a static analysis tool found a SQL injection in a code review, or a fuzzing agent surfaced a buffer overflow in a library. Impressive, useful, but scoped to a specific task.

This is different because of three characteristics:

**1. End-to-end autonomy.**

Previous AI hacking tools needed a human to chain them together. You'd run a scanner, feed results into a planner, then manually execute each step in the attack chain. Mythos treated the entire kill chain as one continuous task. It planned, adaptively responded to what it found at each stage, and executed — without stopping to ask for permission at each step.

That's a meaningful architectural difference. It's the difference between "AI helps me hack better" and "AI hacks autonomously."

**2. No red-team optimization.**

These models weren't fine-tuned for hacking. AISI tested stock Claude and GPT-5.5 with default prompting. The cyber capabilities emerged from general reasoning, not专项 optimization. That means any frontier model with similar general reasoning capability has latent offensive potential. The capabilities that make a model good at understanding code, planning multi-step solutions, and executing complex tasks — those same capabilities, applied in an adversarial context, look exactly like offensive cyber capabilities.

The model doesn't know the difference. And right now, the safety eval is catching up to the capability, not getting ahead of it.

**3. The velocity.**

This is where it gets genuinely alarming. AISI estimates frontier cyber-offense capability is doubling every 4 months. Seven months ago, that rate was 7 months. We're not in a linear progression — we're in an exponential one. And the public cybersecurity market is pricing this like it's linear.

Let me put numbers on that. If offense capability doubles every 4 months:

  • Today: models are completing ~70% of expert-level offensive tasks
  • In 4 months: ~140% (meaning they exceed human expert on average)
  • In 8 months: the eval ceiling breaks entirely

That's not speculative. That's math.

The Velocity Problem Nobody Is Talking About

Let me put this plainly because I think the industry is softening the message in a way that's genuinely dangerous:

If cyber offense capability is doubling every 4 months, then by this time next year the models we're shipping as "helpful coding assistants" will be capable of fully autonomous network penetration testing at a level that makes current penetration testing tools look like a calculator.

Not because anyone is maliciously programming them to. Because the same capabilities that make a model good at reasoning about code, understanding network topology, and following multi-step plans — those capabilities, applied to an adversarial context, look exactly like offensive cyber capabilities.

There's no wall between "good at reasoning" and "good at hacking." The alignment research hasn't solved this yet because it wasn't designed to solve this. It was designed to prevent models from refusing obviously malicious requests, not to prevent models from achieving malicious outcomes through legitimate reasoning paths.

This is the difference between intent-based safety and outcome-based safety. And the AISI results show we're not as far along on outcome-based safety as the industry has been implying.

What This Means For Your AI Stack

If you're building AI agents that interact with enterprise systems — especially anything involving credentials, network access, or sensitive data — this should change your threat model. Not hypothetically. Practically.

Here's what I mean by that:

**Your agent has more privilege than you think.**

If your AI agent can read emails, access files, query databases, or interact with cloud APIs, it has a functional kill chain. Not because you built it that way — because the underlying model has the reasoning capability to construct one from general-purpose tools.

I know what most people are thinking: "My agent only has read access to X." That's not as protective as you think. In a modern enterprise environment, read access to the right systems is often sufficient for privilege escalation. And the model doesn't need to know that upfront — it can discover the paths through exploration, the same way a human red-teamer would.

**Your logging stack isn't built for agent-native attacks.**

Traditional security logging assumes human actors with bounded speed and predictable behavior patterns. An AI agent moving through your systems moves at machine speed and follows statistical rather than intuitive patterns. Legacy SIEM rules won't catch this. The behavioral baseline is completely different from what those systems were calibrated on.

**Your supply chain includes model providers you can't audit.**

When your agent calls an external LLM API, you're trusting that provider's safety evaluations. Most providers don't publish their cyber-offense evaluation results. You have no visibility into whether the model running in your system has the same capability profile as what AISI tested.

This isn't an argument against using external models. It's an argument for understanding what you're trusting when you integrate them.

The Defender's Dilemma

There's an uncomfortable asymmetry in this situation:

Defensive AI has to be right every time. Offensive AI has to work once.

This isn't unique to AI — it's the classic defender's dilemma in cryptography, in network security, in all of cybersecurity. But AI makes the asymmetry sharper in two ways:

**First, the offense is getting dramatically cheaper.**

A model that can run full penetration tests autonomously costs the same as one that helps write code. The marginal cost of an AI-driven attack is approaching zero for anyone with API access. Nation-state actors and criminal organizations already have access to frontier AI models. The economics of cyber offense are about to flip in a way that should concern every security team.

**Second, the defense surface is expanding.**

Every new AI agent tool you add to your stack is a new attack surface. Every MCP server, every tool definition, every credential your agent holds — these are all potential pivot points for an agent operating in an adversarial context. The agentic architecture patterns we're excited about building — those are the same patterns that create expanded attack surfaces.

The companies building integrated XDR platforms — CrowdStrike, Palo Alto, Microsoft Defender — are actually well-positioned here if they can ship AI-native architectures. They have the orchestration layer. They have the data. The question is whether they can move fast enough.

For builders, the implication is: pick your AI infrastructure partners carefully. The security vendors who treat AI as a bolt-on to legacy detection stacks are going to get left behind. The ones who build AI-native defense primitives from day one — that's where you want to be positioned.

My Take — What Builders Need to Do Right Now

I don't write this to be alarmist. I write it because I think the people building the systems that matter most need to be honest about what's actually happening, not what the press release says about it.

Three things I'd do if I were building AI agent systems today:

**Audit your agent's privilege surface.**

Map every credential, every API key, every filesystem path your agent can touch. Now ask: if this agent were operating adversarially, what's the maximum damage radius? Design for that radius, not for the happy path.

This isn't about being paranoid. It's about being precise. You can't defend what you haven't measured.

**Add AI-native security monitoring, not legacy SIEM rules.**

You need behavioral anomaly detection that understands agent-native patterns — rapid tool chaining, unusual API call sequences, credential access patterns that deviate from baseline, API calls that happen at unusual times or volumes.

Tools like Netography and some of the AI-native security startups are building this. Legacy SIEM vendors are behind. Watch this space carefully.

**Pressure test your MCP server and tool definitions.**

Every tool your agent can call is a potential attack vector. Audit your tool schemas for over-privileged definitions. If your agent can "read any file" and "execute shell commands," you've built a remote code execution primitive, whether you intended to or not.

The principle is the same as least privilege in traditional systems: your agent should have exactly the access it needs to do its job, and nothing more.

The Bottom Line

The agents are coming. Some of them are already here. The AISI evaluation isn't a warning about the future — it's a description of the present. The capability is real. The velocity is real. The gap between offense and defense is real, and it's widening.

If you're building AI systems and you're not thinking about the threat model for autonomous agent operations in adversarial contexts, you're building on a foundation you haven't tested.

Go read the AISI evaluation. It takes 20 minutes. Then look at your agent's privilege surface. That's the homework.

*This piece is for the builders. If you found it useful, share it with someone building AI systems who needs to understand the real threat model. Questions or pushback? Reply to this email — I read everything.*

*Category: AI Security | Runtime: Research + Analysis | Published: 2026-05-08*

Related Dispatches
Put this into production