Everyone spent this week writing about GPT-5.5's goblin problem and whether the new iPhone has enough AI features. Meanwhile, on May 11, 2026, Mira Murati's Thinking Machines Lab shipped its first real product, and the coverage missed the point entirely.
Let me be direct about what happened: Thinking Machines Lab released a research preview of TML-Interaction-Small, a 276-billion parameter mixture-of-experts model with 12 billion active parameters, built to interact in real-time — and I mean actually in real-time, not the theatrical version where a voice-activity detector tells a turn-based model when the human has finished speaking.
The benchmarks look good. The architecture is more interesting.
The industry has spent the last two years slapping real-time capabilities onto models that were designed for something else. The result is a layer of scaffolding: voice activity detectors that guess when you've stopped talking, separate components that manage dialog state, systems that interrupt the model's perception while it's generating so it can receive the next message. The model itself doesn't do any of that. It's just responding to outputs from a detection layer that was bolted on.
Thinking Machines calls this a "harness." Their framing is accurate. Most commercial real-time voice systems — including OpenAI's GPT-Realtime and Google's Gemini Live — are running a turn-based model inside a scaffolding of external components that emulate interactivity. Those components work. They also freeze the model's perception during generation, create awkward gaps at turn boundaries, and cannot handle the kinds of fluid interruption and simultaneous speech that characterize actual conversation.
The interaction model doesn't use that harness. TML-Interaction-Small is built from scratch around what Thinking Machines calls time-aligned micro-turns. The system processes 200 milliseconds of input while generating 200 milliseconds of output, with both streams interleaved on the same clock cycle. The model doesn't wait for you to finish a sentence. It doesn't freeze while it's thinking. It can interrupt you mid-sentence, react to visual cues without being asked, and speak simultaneously with you — as in live translation — because it was trained for that, not engineered around it.
200 milliseconds. That's the number that matters, and it's not arbitrary.
Human conversational turn-taking operates around a 200ms threshold. Below that, responses feel simultaneous. Above it, they feel like pauses. Most commercial voice systems target 500ms-1s latency because their architecture requires it: they have to wait for voice activity detection, then wait for the model to receive the full utterance, then wait for generation. The model is doing sequential processing of sequential data.
TML-Interaction-Small hits 0.40 seconds on FD-bench V1 — turn-taking latency — versus 1.18 seconds for GPT-Realtime-2.0 in minimal-thinking mode and 0.57 seconds for Gemini-3.1-flash-live. Those are the company's own numbers, self-reported and not yet independently verified, but FD-bench is a public benchmark and the gap is large enough that directional accuracy is plausible.
On FD-bench V1.5, which scores interaction quality across user interruptions, backchannels, and background speech, the model scores 77.8 against 46.8 for GPT-Realtime-2.0 minimal and 45.5 for Gemini-3.1-flash-live in high-thinking mode. Again: self-reported. Again: large enough signal to be worth watching.
The architectural insight is that this performance comes from the model being built for this, not optimized around it. The model uses dMel features for audio input — not a separate encoder — and all components are co-trained from scratch with the transformer. The architecture doesn't bolt on multimodality. It never assumed it would be text-only.
This release closes a long gap between funding and product. Thinking Machines Lab was founded in February 2025 and closed a billion seed round at a 2 billion valuation that same year — widely reported as the largest seed round on record. The round was led by Andreessen Horowitz with participation from Nvidia, AMD, Cisco, Accel, ServiceNow, and Jane Street. Until now, the company's only shipped product was Tinker, an API for fine-tuning open-weight models that launched in October 2025.
The intervening months were turbulent. Co-founders Barret Zoph and Luke Metz left in January 2026 to return to OpenAI. Andrew Tulloch decamped for Meta's Superintelligence Labs after Mark Zuckerberg reportedly offered billion to acquire the company outright — an offer that was reportedly rebuffed. Meta has since hired five founding members of the lab. Murati promoted Soumith Chintala, a co-creator of PyTorch, to CTO.
On the infrastructure side, the compute story moved in the opposite direction. In March, Thinking Machines announced a partnership with Nvidia covering Vera Rubin systems deployment. The company has also expanded its Google Cloud relationship to cover frontier model training on Nvidia GB300 hardware. This is not a company running out of compute.
The funding situation is more complicated. A reported follow-on round at roughly 0 billion valuation did not close by the end of 2025. That changes the pressure profile. The research preview is not just a product launch — it's evidence that the company can execute, which matters for the next fundraise.
The strategic bet Thinking Machines is making is that the next axis of competition is interaction speed and quality, not autonomous agent capability.
Every major lab has spent the past year pushing agents that work autonomously — code review agents, research agents, automation agents that complete tasks while you sleep. That's valuable. It's also incomplete. Most real work isn't separable into tasks you specify upfront and walk away from. It requires a human in the loop, giving feedback, clarifying intent, correcting course. The interaction model argument is that today's AI interfaces push humans out not because the work doesn't need them, but because the interface has no room for them.
The interaction model changes that calculation. When the AI can interrupt you, respond to your visual context without being asked, and maintain simultaneous awareness across audio, video, and text — the collaboration bottleneck disappears. The human stays in the loop not because the interface forced them there, but because the interface finally made it worth staying.
This is not a voice model. Voice models are turn-based models with voice activity detection and a text-to-speech layer. TML-Interaction-Small is a fundamentally different architecture that happens to be excellent at voice because it was never not listening and never not thinking.
The research preview is not yet available to enterprises or the public. Limited partner access is planned for the coming months, with wider release later in 2026.
The benchmarks are self-reported. FD-bench V1 and V1.5 are public benchmarks, but Thinking Machines' specific scores have not been independently verified under realistic load. The proactive visual cue tests — adapted versions of RepCount-A, ProactiveVideoQA, and Charades — are new instruments without established third-party baselines.
The compute story is real, but so is the talent story. Five founding members left for Meta. Two co-founders returned to OpenAI. This is not a team that has executed without disruption.
And the production stress test is the one that matters. Long sessions, unreliable connectivity, safety constraints on real-time refusal — these are the conditions that will determine whether the interaction model is a research preview that became a product or a research preview that stayed a research preview.
If the interaction model works in production — if the latency holds, if the safety constraints can be maintained in real-time, if the quality doesn't degrade over long sessions — then the implication for the field is significant: the turn-based interface is a transitional artifact, not a permanent structure.
That has consequences for every team building conversational AI, every voice interface product, and every "agentic" tool that currently requires you to wait for the model to finish before you can respond. The teams that have built the most sophisticated autonomous agents are exactly the teams that will be most disrupted by a model that makes the human-in-the-loop actually viable.
The question for the next six months: does the research preview become a product, or does it stay a preview that demonstrated an interesting idea? Thinking Machines has the compute, the funding, and the founder credibility. What they need is execution on the release timeline and independent verification of the benchmarks.
If you want to know what AI interfaces look like in two years, this is the preview worth watching.
*Thinking Machines Lab TML-Interaction-Small research preview released May 11, 2026. 276B parameters, 12B active, MoE architecture. 200ms real-time interaction. FD-bench V1 turn-taking latency: 0.40s vs GPT-Realtime-2.0 at 1.18s. Founded by Mira Murati, B seed at 2B valuation (2025). Limited research preview opening to partners in coming months.*