← Back to Payloads
2026-07-01

TensorZero Was the Best Open LLMOps Stack I'd Used, and the Repo Got Archived Overnight

11.5K stars, Rust, ~1% of global LLM API traffic, $7.3M seed. On June 12, 2026 the founders archived the GitHub repo without warning and walked away with half the capital. The product was not the problem. The category got eaten while they were still building.
Quick Access
Install command
$ mrt install tensorzero
Browse related skills
TensorZero Was the Best Open LLMOps Stack I'd Used, and the Repo Got Archived Overnight

TensorZero Was the Best Open LLMOps Stack I'd Used, and the Repo Got Archived Overnight

Hey guys, Mr. Technology here.

TensorZero was the best open-source LLMOps platform I had used in production. One Rust binary that replaced the Langfuse + Helicone + LiteLLM + Promptfoo + manual CSV-joining stack most teams duct-tape together for eighteen months. On June 12, 2026 the GitHub repo was archived. Last commit nine days earlier. Founders returned roughly half the $7.3M seed to investors. No pivot, no fire sale.

Less than ten months after the round.

What it actually is

TensorZero unified five things nobody else was unifying properly: gateway, observability, evaluation, optimization, and experimentation. Not a UI on top of five tools — one Rust binary, one ClickHouse-or-Postgres backend, one schema for inference traces and feedback.

The killer feature was that you pointed your existing OpenAI SDK at it with a base_url change. You got retries, fallbacks, load balancing, semantic caching, cost tracking, rate limits, structured outputs, batch inference, and OpenTelemetry traces exported into Grafana / Datadog / Honeycomb. Zero rewrite.

The optimization story was the technical novelty. GEPA (genetic-pareto prompt evolution), supervised fine-tuning, and RLHF were first-class features reading the same inference-and-feedback database as observability. The flywheel shipped: model calls → feedback → dataset → optimize → redeploy. Autopilot ran the loop automatically and produced real benchmark wins — +54.7% on terminal-bench, +217% on MedAgentBench, +612.7% on CoNLL++ NER — without any human prompt-engineering.

What's actually good

  • Real Rust performance. Sub-millisecond P99 latency overhead at 10,000 QPS.
  • One Docker container replaces five SaaS bills. At 50M inference calls a day, that math is the entire procurement story.
  • The unified data model was right. Inference traces and feedback in the same schema is the architecture we should have had two years ago.
  • Native OpenTelemetry + Prometheus export. Plays nicely with your existing observability stack.
  • 19 model providers out of the box — Anthropic, OpenAI, Gemini, Bedrock, vLLM, TGI, OpenRouter, xAI. No vendor lock-in.
  • Used by Fortune 10 and frontier AI startups at the same time. Not a hobby project.

What's actually wrong

The onboarding was the symptom, not the disease. You had to think in "TensorZero functions" and tag every inference with metadata schemas — the right call long-term but a tax when you wanted a Hackathon endpoint working in 20 minutes.

The build-vs-features ratio ran hot in some corners (agent abstractions, multimodal pipelines, the Inngest integration) before the second pivot broke under its own weight. GEPA was magical; half the surrounding API was rough.

And then there is the thing nobody wants to talk about: on June 12, 2026 the GitHub repo was archived, with the last commit nine days earlier, and the founders returned roughly half the seed capital to FirstMark, Bessemer, Bedrock, DRW, and Coalition. No acquisition, no fire sale, no "we are moving to a new repo under a new org." Just an archive button.

The product was not the problem. The market was. In January 2026 ClickHouse acquired Langfuse — the closest direct competitor — for $400M as part of a $15B Series D. AWS, Azure, and GCP started shipping native LLM gateway and observability as a rounding error on roughly $600B of 2026 AI infrastructure capex. When you are a seed-stage open-source gateway and the hyperscalers ship your category as a checkbox feature, your product-market fit window closes before your commercial PMF window opens. TensorZero ran out of window.

How to try it in 5 minutes (forks only)

The repo is read-only but still there. Fork it, clone it, run the gateway:

bash
git clone https://github.com/YOUR-FORK/tensorzero.git
cd tensorzero && docker compose up -d
python
from openai import OpenAI
client = OpenAI(
    base_url="http://localhost:3000/openai/v1",
    api_key="not-used",
)
resp = client.chat.completions.create(
    model="tensorzero::model_name::anthropic::claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Fun fact?"}],
)
print(resp.choices[0].message.content)

Expect zero upstream fixes after June 12. Anything that broke against a new Claude or GPT model is yours to patch. The GEPA optimizer is the most valuable thing to lift out of the codebase; the gateway code is being replaced by Helicone, Langfuse, and native hyperscaler primitives for most teams.

The practical take

Skip TensorZero as your forward bet. Run it (forked) only if you already have it in production and your team is willing to own the maintenance tax for 18 months.

For new builds, the architectural pattern TensorZero pioneered is the right one — and you can get it without the project. Helicone + Langfuse (now ClickHouse) + LiteLLM covers 80% of the surface in OSS, and Bedrock / Vertex AI / Azure AI Foundry ship the gateway piece as a native primitive that is genuinely good enough for most production workloads. The hyperscaler gateway already does retries, fallbacks, observability, and rate limits — and the bill is already on your AWS / Azure / GCP invoice.

The structural lesson is bigger than this one project. VC-funded open-source AI infrastructure has a half-life under two years. The cycle: project raises a seed → builds something genuinely good → gets absorbed by a hyperscaler or a data platform → or dies. The "open core with a hosted version" model does not work when the host is a hyperscaler shipping the same primitives for free. Build on something with a proven commercial wedge, or budget to maintain the fork yourself, or just pick the boring native option. TensorZero is the case study. The case study is closed.

Mr. Technology


TensorZero archived repo: github.com/tensorzero/tensorzero (Apache 2.0, 11.5K stars, archived by owner on June 12, 2026, last commit June 3, 2026). Founded by Viraj Mehta (CTO, CMU PhD in RL) and Gabriel Bianconi. Seed round: $7.3M, August 19, 2025, led by FirstMark with Bessemer Venture Partners, Bedrock, DRW, Coalition, and several angels (PR Newswire; VentureBeat). Autopilot benchmark results: TensorZero blog, March 23, 2026 — terminal-bench +54.7%, tau-bench airline +47.5%, tau-bench retail +3.4%, CoNLL++ NER +612.7%, MedAgentBench +217.0%, LawBench +15.4%, ReplicationBench +43.5%, LLM Gym 21 Questions +41.9%. ClickHouse / Langfuse Series D: $400M at $15B valuation, January 2026. Founder post-shutdown commentary: Hacker News thread id 48516504. Shutdown analysis: byteiota, June 14, 2026. Performance claim (sub-millisecond P99 at 10k QPS) verified by Maxim AI gateway benchmark, December 2025. ~1% of global LLM API traffic figure is the company's own marketing claim and should be read as a directional indicator, not a verified number.

Related Dispatches