
I'll say it plain: most of what passes for "AI safety" in 2026 is a press release function. The work being celebrated as safety is, in the vast majority of cases, a public relations operation that lets frontier labs justify whatever they were going to do anyway. The people doing actual safety engineering are, almost without exception, not the ones on stage at the policy summits.
Let me name what I'm talking about. The "Responsible Scaling Policy" — that document every major lab now publishes. It is a beautifully formatted PDF that says, in essence: "We will deploy capability X only when we have mitigations Y, Z." It looks technical. It cites the literature. It has version numbers and review dates. And it is, functionally, a press release. The thresholds get moved. The "commitments" are aspirational. The board that supposedly enforces them reports to the CEO who appointed them. It is governance theater in the oldest sense: the appearance of accountability without the substance.
Here is what real AI safety engineering looks like. A junior engineer writing a regression test for a prompt injection vector at 2 a.m. A red team producing a 40-page report of failure modes, none of which the marketing team will ever let you publish. A calibration curve for a refusal classifier that has to hit 99.5% to clear ship. An evals harness that takes 14 hours to run, costs $8,000 per pass, and tells the team something they already suspected. A security review on a tool-calling interface that found a path to exfiltrate user data through a side channel nobody on the deployment side even knew existed.
None of this makes the keynote. None of it gets a press release. The work that does get announced is the policy document. The "framework." The "principles." The "commitment to safe deployment."
You can tell which labs are doing real safety work by the ratio of public policy output to private evals output. If a lab ships a safety policy every quarter and you can count their public evals reports on one hand, the policy is the product.
The most cynical move in the whole theater is the "we paused training for safety" announcement. In 2023 it was the six-month pause letter. In 2024 it was the first responsible scaling policy. In 2025 it was every lab's "we will not release a model until we can prove it is safe" pledge. The pattern is always the same: announce a pause, receive positive coverage, then ship the next model on the original timeline.
Pauses that pause nothing. Commitments that bind no one. Policies that exist to be cited in congressional testimony, not enforced internally.
Name one frontier model that shipped later than originally planned because of a safety review. The schedule slips because of compute. It slips because of evals against benchmarks. It does not slip because the safety team said no.
The standard defense is: "Better to have the policy than not. It shapes norms. It gives regulators something to point to. It creates a paper trail."
Sure. The same defense works for a binding arbitration clause in a terms-of-service agreement. It works for ESG reports. It works for any document whose primary function is the appearance of process. I am not arguing the documents are useless. I am arguing they have been allowed to substitute for the actual work, and the substitution is now nearly total.
Real safety looks like this: a team that has the authority to block a ship. A red team that reports outside the product org. Evals that run before release, not as justification after. A budget line item for safety that does not get raided when the model is behind schedule. A board that can say no to the CEO.
Almost no lab has all five. Most have a PDF.
— *Mr. Technology*