
Let me give you the tl;dr first because this is going to ruffle feathers: most AI agent frameworks shipping in 2026 are pretending to be production infrastructure, and they are not. They give you a loop that calls an LLM, parses JSON, calls a tool, and decides what to do next. What they do not give you — and what your agent will need the moment it touches a real user — is durable execution. Inngest does, it's open source, and the v3 TypeScript SDK released in January 2026 plus the new step.ai.infer() and step.ai.wrap() primitives make it the most direct answer I've found.
Here's the scenario your demo never shows. Your agent accepts a request, calls an LLM, decides to call a tool that hits Stripe, Stripe returns a 500, your LLM-backed retry logic re-prompts the model, the model hallucinates a different tool call, you double-charge the customer, and your framework logs "agent finished" in the trace. Congratulations — you shipped an agent that fails in ways that look like success.
The deeper problem is duration. An agent that runs for 40 minutes cannot live inside a single Lambda invocation, a single Next.js route, or a single process. If your process restarts — and it will, because OpenAI will 429 you, your database will blip, your deploy will roll — you lose the entire conversation. Every "agent framework" that does not address this is selling you a toy.
Temporal addresses this. It's the canonical answer. It's also 18 months of learning a workflow DSL, a separate cluster, and a Go-style developer experience. Inngest is what I'd build if Temporal were correct but heavy. **You write normal async functions. The SDK checkpoints every step.run call. If your process dies, Inngest replays the function from the last successful step. That's it.** No DSL, no separate cluster if you don't want one, no workflow definition language.
Here's a real Inngest function with the v3 SDK. Note the lack of any workflow engine boilerplate — this is just a TypeScript async function:
```ts import { Inngest } from "inngest"; import { openai } from "@inngest/ai";
export const inngest = new Inngest({ id: "support-agent" });
const agent = inngest.createFunction( { id: "resolve-ticket", triggers: [{ event: "support/ticket.created" }], concurrency: { key: "event.data.tenantId", limit: 5 }, }, async ({ event, step, stepAI }) => { // Each step.run is memoized by name. If this dies at step 3, // replay starts at step 3 — steps 1 and 2 are not re-executed. const summary = await step.run("summarize-ticket", async () => { return summarize(event.data.body); });
// step.ai.infer() offloads the LLM call to Inngest's infrastructure, // so you don't pay for Lambda compute while waiting on OpenAI. const plan = await stepAI.infer("plan-actions", { model: openai({ model: "gpt-4o" }), body: { messages: [{ role: "user", content: summary }] }, });
// Sleep until a human approves. This function pauses for days if it must. const approval = await step.waitForEvent("wait-for-approval", { event: "support/ticket.approved", timeout: "72h", if: async.data.ticketId == "${event.data.ticketId}", });
if (!approval) { await step.run("escalate", () => escalate(event.data.ticketId)); return { escalated: true }; }
// Run independent follow-ups in parallel, each as a retriable step. const [refund, email] = await Promise.all([ step.run("issue-refund", () => stripe.refund(event.data.chargeId)), step.run("notify-customer", () => mailer.send(event.data.customer, summary)), ]);
return { refund, email, plan }; } ); ```
Read that again. step.waitForEvent blocks the workflow for up to 72 hours. Promise.all runs two independent steps in parallel. Every block is independently retriable. **If your server dies after issue-refund and before notify-customer, replay resumes at notify-customer.** The customer does not get refunded twice. The customer is not un-notified. This is what production means.
step.ai Actually Buys YouThe 2026 release added three primitives worth understanding. step.ai.infer() offloads the LLM HTTP call to Inngest's infrastructure — your function suspends, the inference happens elsewhere, and your function resumes with the response. On Lambda this is a real money saver because you're not paying for function-seconds while waiting on a 12-second GPT-4o response. Inngest never sees your OpenAI key; the SDK makes the call from your function, but the function is suspended while the request is in flight, so the billable compute drops to near zero.
step.ai.wrap() wraps an existing AI SDK call (OpenAI, Anthropic, Vercel AI SDK) and adds observability — prompts, tokens, latency, cost — into the same trace as your workflow steps. The AgentKit SDK on top of this gives you a ReAct loop where every tool call is itself a retriable step.
The honest part: the inference offload only works on serverless deployments. On a long-running container it doesn't matter because you weren't paying for idle time anyway. The real win is the trace unification — you finally get to see the prompt, the tool call, the retry, the human approval, and the side effect on a single timeline. No more "the LLM did the wrong thing" mystery theater.
Things that break in Inngest and you should know about upfront. First, non-deterministic code outside step.run is replayed on every retry — if you read a counter or generate a UUID at the top level, you'll get a different value on replay than on the original run. The SDK warns about this but doesn't prevent it. Wrap side-effecting reads in step.run.
Second, the function-versioning model is a v3 feature, not a v2 inheritance — if you're on v2 and your function changes shape mid-run, you can get stuck in an unmigratable state. Upgrade before you have long-running workflows in flight.
Third, concurrency keys are powerful but a footgun. concurrency: { key: "event.data.tenantId", limit: 5 } will silently queue 10,000 requests for one noisy tenant behind five workers. Set per-tenant timeouts.
If your agent runs in under 30 seconds and only calls a single model, Inngest is overhead — just use a queue and a worker. If you need true cross-region active-active, Temporal is still the more proven primitive. If your team is allergic to TypeScript, the Python SDK is fine but the Go/Rust SDKs Inngest promised in the v3 announcement are still "in development" as of mid-2026 — don't bet a migration on them.
AI agents are stateful, long-running, parallel, retriable, often human-gated workloads pretending to be stateless LLM wrappers. The frameworks — LangGraph, Mastra, Pydantic AI, Smolagents — give you the agent loop. They do not give you the workflow engine underneath it. Inngest is the open-source engine underneath, with an AI-shaped API added in 2026, and it runs locally via the Inngest Dev Server with production parity so you can develop the whole thing offline.
Stop shipping agents that lose state on deploy. Add a workflow engine. Inngest is the cheapest one I know to adopt.
Repo: github.com/inngest/inngest — Apache 2.0, self-hostable, TS/Python/Go SDKs, dev server, AgentKit for ReAct loops, 1.0 GA since September 2024, v3 TypeScript SDK January 2026.