Perplexity AI unveiled a hybrid local-cloud inference orchestrator at Computex 2026 that decides in real time which AI workloads stay on your device and which route to the cloud. Nvidia's RTX Spark superchip and Intel's Core Ultra Series 3 are the silicon that makes it real.

The next AI bottleneck isn't the model

The model layer is the part of AI that gets all the attention. The infrastructure underneath it is what's actually breaking. At Computex 2026, Perplexity AI demonstrated a hybrid local-cloud inference orchestrator that decides — task by task, mid-execution — which workloads stay on your device and which route to frontier models in the cloud. It's the most concrete answer yet to the question enterprise AI buyers have been asking for the past year: where does the work actually run?

What You Need to Know: Perplexity (now valued at $20B) demonstrated a "Personal Computer" agent at Intel's Computex keynote that uses local models on Intel Core Ultra Series 3 to handle privacy-sensitive data and routes heavy reasoning to cloud frontier models — without the user choosing in advance. The launch coincides with Nvidia's RTX Spark superchip (20 Arm CPU cores, Blackwell GPU with 6,144 CUDA cores, 128GB LPDDR5X) and Intel's Xeon 6+ / Core Ultra Series 3 silicon. Enterprise token costs are now a top-3 priority; the orchestration layer matters more than any single model.

Why It Matters

The model is no longer the scarce resource. The orchestration layer, the inference system, and the silicon it runs on are the new competitive surface.
Token costs are bending enterprise AI roadmaps. The Vista Equity Partners analysis put average enterprise AI model spend at $7M in 2025, up from $2.5M in 2024 — a 2.8x jump in a single year, and that's before the next agentic workload wave hits.
Hybrid inference unlocks regulated workloads. Investment banks, healthcare systems, and defense contractors can keep confidential data on the device and still use frontier reasoning — the same workflow pattern that was blocked by compliance a year ago.
Nvidia, Intel, and Apple are all racing to own the on-device AI PC socket. RTX Spark, Core Ultra Series 3, and Apple Intelligence's Private Cloud Compute are the same bet from three different angles.
Perplexity's bet is the model-agnostic orchestrator. Its annualized recurring revenue cleared $450M in March 2026, but the company is now on the hook to prove that orchestration — not a frontier model — is the durable layer.

What Actually Happened

Perplexity's Computex Demo

On May 19, 2026, Perplexity CEO Aravind Srinivas took the Intel keynote stage at Computex with Intel CEO Lip-Bu Tan to demonstrate the new hybrid inference orchestrator. The setup: an agent processing confidential deal materials, with local models on Intel Core Ultra Series 3 deciding which information stays on the device and which is sent to cloud-based models. Perplexity's claim is that no product has done this before — the routing decision is made task by task, in real time, without the user choosing in advance. Sensitive data stays on the device; frontier reasoning runs in the cloud. The system is also designed to ask for user permission before sending sensitive tasks off-device, addressing one of the central anxieties enterprise security teams have about agentic AI. (VentureBeat)

Nvidia's RTX Spark Superchip

Hours before the Intel keynote, Nvidia CEO Jensen Huang unveiled the RTX Spark, a new Arm-based superchip positioned as the foundation for AI-native Windows PCs. At full strength, the RTX Spark Superchip offers up to 20 Arm CPU cores, a Blackwell GPU with 6,144 CUDA cores, 128GB of LPDDR5X RAM, and up to 300 GB/s of memory bandwidth — enough power and memory for AI agents and 120-billion-parameter models with context lengths stretching to a million tokens. RTX Spark systems begin arriving in the fall of 2026. The strategic positioning is clear: Nvidia wants to be the silicon underneath every AI PC, not just the data center. (Nvidia newsroom)

Intel's Counter-Move

Intel's Computex keynote unveiled Xeon 6+ processors with 288 efficiency cores built on 18A technology for the data center, and Core Ultra Series 3 as the client silicon that makes hybrid inference viable on the PC. The message was the same as Nvidia's: the on-device inference workload is the next growth market, and both companies want to be the chip it runs on. (Intel)

The Enterprise Cost Pressure

VentureBeat's research arm published an analysis this month arguing that infrastructure, inference, and compute costs are reshaping enterprise AI priorities. The Vista Equity Partners data point is the most quotable: average enterprise AI model spend hit $7M in 2025, up from $2.5M in 2024. The CrewAI enterprise survey ranks security and governance as the #1 evaluation factor for agentic platforms. IDC forecasts a 10x increase in agent usage and a 1,000x growth in inference demand by 2027. The math is brutal: a 1,000x increase in inference demand met by a 2.8x-per-year cost ramp pushes the total bill into the territory where CFO-level conversations start. (VentureBeat research, Vista Equity Partners)

Perplexity's Product Arc

The Computex demo is the third leg of a product story Perplexity has been building all year. In February 2026, the company launched Computer, a multi-model agent that orchestrates 19 different models (Claude, Gemini, GPT, Grok, others) to complete complex tasks — entirely in the cloud. In March, Personal Computer launched as a Mac app with a hybrid local-cloud agent. Computex extends the architecture: instead of just choosing which model to use, the system now chooses which physical location should process each piece of a task. The orchestrator manages the handoff between local and cloud mid-execution. (VentureBeat)

The Take

The "next AI bottleneck isn't the model" framing is technically true but misses the more important point: the next bottleneck isn't a single thing. It's a stack — silicon, inference runtime, orchestration layer, network latency, data governance, and the user interface for routing decisions. Perplexity's bet is that the orchestration layer is the most defensible piece of that stack. The company's competitors disagree: Nvidia is building the silicon layer, OpenAI is building the model-plus-runtime layer, Apple is building the privacy-preserving on-device layer, and Google is building the cloud-plus-edge split. All four bets can coexist. The question is which one becomes the default interface for non-technical enterprise buyers in 2027. The hybrid inference demo is Perplexity's answer. Whether enterprise security teams accept the routing logic — and whether the orchestrator's heuristics actually hold up outside a controlled stage demo — are the open questions that will decide it.

Quick Summary

Perplexity showed a hybrid local-cloud inference orchestrator at Computex 2026 that makes the local-vs-cloud routing decision per task without user input, running on Intel Core Ultra Series 3 silicon. Nvidia's RTX Spark superchip and Intel's Xeon 6+ position both companies to own the AI PC socket. Enterprise AI spend is up 2.8x year-over-year, and IDC forecasts a 1,000x inference demand increase by 2027 — the orchestration layer is where the next 18 months of competition will land.

The next AI bottleneck isnt the model