← Back to Payloads
data2026-06-11

Your agents are returning different answers from the same da

Anthropic automated 95% of its own business analytics with Claude — and watched accuracy decay to 65% in a month without maintenance. Supabase closed $500M Series F at $10.5B. VentureBeat argues the only real guardrail is a curated business ontology.
Quick Access
Install command
$ mrt install data
Browse related skills
Your agents are returning different answers from the same da

Your agents are returning different answers from the same data

Hey guys, Mr. Technology here — let me break this one down.

What You Need to Know: Anthropic just published how it automated 95% of its own business analytics queries with Claude, hitting roughly 95% accuracy — and watched that accuracy collapse to 65% within a month when the underlying data drifted. Meanwhile, Supabase closed a $500M Series F at a $10.5B valuation, and VentureBeat's Data Infrastructure Weekly argues the only real guardrail for hallucinating agents is a curated business ontology.

Why It Matters

  • "Same data, different answer" is the silent failure mode of every agent deployment. LLMs don't crash; they confidently disagree with themselves, and the only way to detect it is to log the SQL, the row counts, and the diff.
  • Anthropic's 95% number is the new benchmark — and the 65% decay is the asterisk. Anyone selling "self-service analytics agents" in 2026 should be quoting the decay number as loudly as the accuracy number.
  • Supabase at $10.5B and growing on agentic database calls is the clearest signal yet that the bottleneck is shifting from "the model" to "where the data lives." The agent database wars are officially a 2026 story.

Anthropic automated 95% of its own analytics — and the accuracy fell off a cliff

Anthropic published a detailed post-mortem on how it uses Claude internally to automate business analytics. The headline: 95% of business analytics queries at Anthropic are now automated via Claude agents, with ~95% accuracy in aggregate (Anthropic, 6/3/2026). The post-mortem is honest about the failure mode: when the data team didn't actively maintain the system, accuracy dropped to 65% within a month. Re-investing in maintenance brought it back to ~90%.

The bottleneck was never SQL generation. It was context — governed semantic layers, schema documentation, business definitions, and an eval loop that re-graded answers against a held-out set of "what a human analyst would have said." A LinkedIn post by the data team distilled it: "The bottleneck was never SQL. It was context: governed semantic layers" (LinkedIn — bygravity).

This is the playbook every data team in 2026 needs to copy or be undercut by one that does. The 65% decay number is the one to put in your next planning doc: "if we ship self-service agents without an eval loop, we'll be wrong a third of the time within a quarter."

Supabase closes $500M Series F at $10.5B

Supabase announced a $500 million Series F at a $10.5 billion valuation on June 4, 2026, led by GIC with all existing investors participating (Supabase PR Newswire, 6/4/2026; TechCrunch, 6/5/2026). The company describes itself as "the open source Postgres development platform and leader in agentic infrastructure," and the pitch is that vibe-coded apps need a database backend that can survive being mutated by agents 24/7.

The numbers Supabase is throwing around: AI app builders now account for the majority of the platform's user base, and Claude Code is the largest contributor to the Supabase codebase since the start of 2026. The funding is going to "accelerate lead in agentic infrastructure" — which is industry speak for "we are the database the agents are talking to."

This is the clearest signal yet that the bottleneck is shifting. In 2024, it was "we need a better model." In 2025, it was "we need a smarter agent framework." In 2026, it's "we need a database that doesn't melt when 1,000 agents are running read-replica queries in parallel."

"Ontology is the real guardrail" — VentureBeat's argument for the semantic layer

VentureBeat's Data Infrastructure Weekly published a piece arguing that the only sustainable way to stop AI agents from giving different answers to the same question is to ground them in a curated business ontology — a governed semantic layer that defines what "customer," "revenue," and "active user" actually mean in your company (VentureBeat). The argument: an LLM can write any SQL it wants, but if the metric definitions in the system prompt are pinned to a single source of truth, the "same question, different answer" problem collapses.

This is the same conclusion Anthropic reached internally with the 95% number. The model is not the bottleneck. The semantic layer is. Anyone building agents in 2026 who hasn't yet stood up a governed metrics catalog is going to ship an agent that quietly gives the CFO a different revenue number than the one the CEO got last week.

The Take

Three stories, one through-line: the model was never the bottleneck. The data — the schema, the semantic layer, the eval loop, the master-password-equivalent — is.

Anthropic just publicly admitted it. Supabase is pricing in. VentureBeat is naming the disease. The teams that will win in 2026 are the ones that stop treating their warehouse as a side effect of their model and start treating it as the product.

For a builder, the practical playbook:

  • Stand up a governed metrics layer before you ship the agent. Pin the metric definitions in the system prompt. Diff every answer against the last N answers.
  • Log the SQL and the row counts. If two answers disagree, you can grep. If you don't log, you'll just be told "the model hallucinated" and you'll never know why.
  • Plan for decay, not just launch. Anthropic's 65% accuracy at month two is the most honest number in AI right now. Build the eval loop, fund the maintenance, and don't ship a self-service agent to the CFO without one.

Quick Summary

Anthropic published how it automated 95% of its own business analytics with Claude, then watched accuracy collapse to 65% within a month of no maintenance. Supabase closed a $500M Series F at $10.5B to be the database for agentic apps. VentureBeat's argument: the only real guardrail against hallucinating agents is a curated business ontology. The bottleneck moved from the model to the data. Build accordingly.


Sources:

Related Dispatches