Apple's June 8 WWDC keynote buried the real story under the Siri AI rebrand. The Foundation Models framework, the second more capable on-device model, image input for the 3B-class model, and the new Core AI framework together turn iOS 27, iPadOS 27, and macOS 27 into a billion-device deployment surface for a free, private LLM. The press is talking about the assistant. The developer-economics story is in the SDK.

Apple Just Made the On-Device LLM a Free Primitive — and the WWDC 2026 Story Is Bigger Than Siri

Most of the coverage of WWDC 2026 is about Siri AI — the new conversational assistant that can see your screen, remember your photos, and call into apps via App Intents. That story is real. It is also the wrong story for the audience I usually write for. The bigger story is what Apple did to the developer surface. On June 8, Apple shipped five changes that together make a free, private, on-device LLM a standard part of every iPhone 17 Pro app you build from this fall forward. The implications for production agents, AI infrastructure economics, and the on-device-vs-cloud boundary in agent design are not subtle. They are the biggest platform shift Apple has made since App Intents shipped in 2018, and most of the tech press is sleeping through it.

What Actually Shipped

Five things from the WWDC 2026 keynote and the developer sessions, in the order that matters to anyone building a real product:

1. Foundation Models runs both on-device and through Private Cloud Compute. The same Swift API now talks to two model tiers. The on-device path is the default for fast, private, low-latency work. Private Cloud Compute is the path the OS picks when the task needs more capacity, longer context, or heavier reasoning — and Apple cryptographically attests that the server-side inference runs on Apple silicon with no logging. For app teams, this is the first time the OS has made "on-device first, cloud only when needed" a routing decision the framework handles for you. (Apple Newsroom, June 8, 2026)

2. A second, more capable on-device model is gated to higher-end hardware. Apple added a stronger local model that supports text, image understanding, and speech — generation and understanding. This model is restricted to recent iPhone, iPad, and Mac systems with enough unified memory and Neural Engine throughput to run it. In practice, the strongest on-device path is a feature gate, not a default. A given capability may be fully on-device on an iPhone 17 Pro, partially on older supported devices, and routed to Private Cloud Compute on hardware that cannot run the larger local model. This is the first time Apple has explicitly created a hardware-boundary in its intelligence stack. (Callstack analysis, June 9, 2026)

3. Foundation Models now accepts image input. Last year, the framework was text-only. At WWDC 2026, the API takes an image alongside text — the keynote example used a photo of an outfit, the model identified the clothing, and the app used its own product logic to recommend similar pieces. For any app that has a "user picks a photo from their library" flow, this means you can add visual understanding without shipping a vision-language model, without a server round trip, and without paying an API provider. (Apple developer session)

4. Foundation Models supports custom skills and can call server-running models through the same Swift API. Skills are user-defined, model-discoverable capabilities — the same pattern OpenAI announced for GPTs and Anthropic shipped for Claude Tools, but tied to the OS rather than a hosted product. The server-call path means an app can mix on-device and remote inference inside a single LanguageModelSession call, with the routing rule decided by the developer, not the user. This is the API surface that turns Foundation Models from a content-generation primitive into an agent primitive.

5. Core AI is a new framework for running third-party local models on Apple silicon. Foundation Models is Apple's model. Core AI is for yours. Domain classifiers, embedding models, fine-tuned vision encoders, distilled chat models in the 1B-7B range — all run natively on the Neural Engine and GPU without each app maintaining its own inference runtime. Core AI is Apple's answer to the "I want to ship a custom model in my iOS app without rewriting llama.cpp for the sixth time" problem, and it lands at the same time as the second stronger on-device model, which sets the hardware baseline for what "custom local inference" actually means in production. (Apple Foundation Models research)

The Architecture Is More Interesting Than the Assistant

Press coverage of WWDC 2026 is anchored on the Siri AI demo. The assistant is real, the screen-awareness is real, the personal context is real, the App Intents routing is real. I do not want to dismiss that work. But the architectural story is the part that will change how you ship software, and it sits underneath the assistant in the keynote.

On-device is now a routing decision the OS owns. Before WWDC 2026, "use the on-device model" was something the developer wrote an if statement for. After WWDC 2026, the OS picks. You describe a session — input, desired output, latency budget, privacy requirement — and Foundation Models handles the on-device-vs-Private-Cloud-Compute split. For an entire class of features, "AI capability" is no longer a thing you build into a product. It is a thing you describe and the OS provides. The product work shifts to the prompt, the skills, and the data the session can read — not the inference plumbing.

The 3B-class on-device model is in the right size range. A ~3B-parameter on-device model that handles text, image understanding, and speech is not a frontier model. It is exactly the right size for a billion-device deployment: small enough to run on a recent Neural Engine in tens of milliseconds for short generations, large enough to be useful for extraction, summarization, structured output, classification, and tool calling. Phi-4-mini, Gemma 3 4B, Qwen 3 4B, and MiniMax M3-small are the same size class. Apple joining that league with hardware-accelerated inference is a meaningful validator of the "small-but-capable" agent stack. (LinkedIn analysis, May 2026)

The model was built with Google's Gemini tech. The press release and the developer sessions both say the new models were created with technologies behind Google's Gemini family, then adapted for Apple hardware and Private Cloud Compute. This is the first time Apple has publicly attributed a foundational layer of its model stack to a partner. The strategic read: Apple is willing to outsource the pre-training research base to Google and own the deployment, the privacy boundary, and the developer surface. That is a different kind of vertical integration than Apple is famous for, and it tells you where Apple thinks the real value is. (Callstack)

App Intents is now AI infrastructure. This is the part the iOS community is underrating. Siri AI routes to your app through App Intents — and Spotlight indexing. If your app exposes actions and content through the system, Siri has something to call. If it does not, the strongest on-device model in the world is no help. The "show me the quote from yesterday," "add this receipt to my expense report," and "use this image in the project brief" demos are not assistant tricks. They are the agent-orchestration story for the world's largest installed app base, and the surface is App Intents. Every iOS team that shipped App Intents as a side project for Shortcuts support now has a reason to treat it as core infrastructure.

The Developer Economics Just Changed

The cost story is the one that matters for production planning. Free, private, on-device inference at OS-level quality changes the math on a class of features that has been priced out for years.

Before WWDC 2026, a "smart" feature in an iOS app usually meant one of three things: a server-side LLM call billed by token, a downloaded model you maintained yourself, or nothing. The server call added per-user cost and per-call latency. The downloaded model added app size, update friction, and a maintenance burden. The "nothing" answer is the one most apps shipped.

After WWDC 2026, a smart feature is: call LanguageModelSession, pay nothing per token, pay nothing per call, run on-device by default, route to Private Cloud Compute when needed, get image input for free, get tool-calling for free, get skills for free. The cost line item disappears for an entire feature class. The only thing you pay is the engineering time to expose the capability, which is the same line item you had before.

This collapses the unit economics of a class of features that the AI infrastructure industry has been trying to monetize. The 50-cent-per-user-per-month "AI add-on" pricing tier for summarization, extraction, structured capture, drafting, and classification becomes much harder to defend when the OS gives every user the same capability for free, on-device, and private.

The pricing pressure does not stop at consumer apps. The Apple Intelligence feature set is the first credible "AI at no cost" deployment at scale in a developed market. Every B2B SaaS vendor whose value prop is "we add AI to your workflow" now has to answer the question of why a customer would pay for that capability when their employees' iPhones, iPads, and Macs already do it natively, with a model Apple is updating for free every fall.

What To Do With It

If you ship an iOS, iPadOS, or macOS app: audit every "AI feature" in your backlog that was killed by cost. The ones that look like extraction, classification, summarization, structured output, draft generation, image tagging, voice command, and intent routing are all candidates. Build the v1 on Foundation Models. Treat Private Cloud Compute as a free upgrade path for the workloads that need longer context or heavier reasoning. Ship a Skills surface for the features where a model-discoverable capability is the right pattern. Add App Intents support for the app's most common user actions — this is now the surface Siri AI and OS-level agents will call into.

If you are building production agents: treat the iOS install base as a deployment surface. An agent that runs on a billion devices, with a free on-device model, free Private Cloud Compute overflow, and a unified Swift API for the agent loop is a different kind of distribution than the web. The "agent on the user's phone that knows the user's photos, messages, calendar, mail, files, and screen" is now a shipped product pattern, not a research demo. Build the agent surface for it.

If you are building AI infrastructure: the "we host the model, you call our API" pricing model just lost a meaningful share of the consumer and prosumer market. The right pivot is upmarket — to the agentic workloads, the long-context workloads, and the B2B deployments where Apple does not have a presence. The wrong pivot is to try to undercut the OS, which is structurally impossible.

If you are on Android or cross-platform: watch this pattern. Google's response is going to be Gemini Nano on Pixel and the on-device Gemini path on Android broadly, plus a Core AI-style framework for third-party local models. The on-device LLM as OS-level primitive is the new default; building against the assumption that every model call is a server call is a 2024 pattern.

The Take

Apple's WWDC 2026 keynote will be remembered as the Siri AI launch. That is a fine headline. The story I will remember is that Apple shipped a free, private, on-device LLM with image input, skills, tool calling, custom-model hosting, and OS-level agent routing — to a billion devices, in a free fall software update. The closed-frontier model providers are now competing with the OS for the consumer and prosumer tier of AI workloads. The on-device-vs-cloud split is no longer a developer decision; it is a routing decision the OS owns. The agent surface is App Intents. The deployment surface is the iPhone install base. The price is zero.

That is the part of WWDC 2026 that is going to reshape the production stack. The rest is a marketing story.

— Mr. Technology

Release date: June 10, 2026. Source event: WWDC 2026 keynote, June 8, 2026, Cupertino. Topics: iOS 27, iPadOS 27, macOS 27, watchOS 27, visionOS 27, tvOS 27. Announced: Foundation Models framework (on-device + Private Cloud Compute), second more capable on-device model (text, image, speech; gated to higher-end Apple silicon), Foundation Models image input, custom skills, server-call routing, Core AI framework for third-party local models, Siri AI (conversational assistant, screen awareness, personal context, App Intents routing). Build: created with technologies behind Google's Gemini family, then adapted for Apple hardware and Private Cloud Compute. Availability: developer beta day-of via Apple Developer Program, public beta next month, free software update this fall. Sources: Apple Newsroom — WWDC26 announcement, Apple Newsroom — Apple Intelligence, Apple developer session 121, Apple Foundation Models research, Callstack — On-device AI after WWDC 2026, LinkedIn — Mobile AI in 2026.