
When Miro's data team pointed AI agents at their production warehouse, more than 65% of the agents' queries hallucinated joins — they invented relationships between tables that didn't exist, or used deprecated ones that did. The fix wasn't better prompts. It was exposing agents to the actual SQL query history that human analysts had already validated. That's the bet behind DataHub's new "Context Intelligence" layer, which mines your existing query log to build a semantic index the agents can read.
What You Need to Know: AI agents pointed at enterprise data warehouses hallucinate joins more than 65% of the time on complex queries, and the only proven fix is grounding them in the SQL query history your team already produced — not in static schema docs.
customer_status = 'gold' means "paying" in one table and "lifetime value tier" in another. Only validated queries carry that business context.On May 28, 2026, DataHub — the metadata platform born from the open-source project founded at LinkedIn — released its Context Intelligence layer. The new module reads a company's existing SQL query history and builds a semantic index that AI agents can query directly, through integrations with LangChain, Google's Agent Development Kit, and CrewAI.
The system is "production-proven" lineage tech, according to DataHub CTO Shirshanka Das, who spent nearly 11 years leading data infrastructure at LinkedIn. The pitch: instead of asking an LLM to guess what a column means, give it the actual queries your analysts have already written, and let the agent reason from proven intent.
Futurum Group's coverage of the launch drove the point home: "When agents ingest static, developer-defined schemas bereft of business context, they frequently hallucinate complex joins and query deprecated tables." That's not a theoretical risk. It's measured.
The catalyst for the conversation was Miro. The collaborative whiteboard company's data team publicly reported that AI agents, pointed at their warehouse with only a schema, hallucinated joins or used deprecated relationships on more than 65% of complex multi-table queries. Miro didn't publish its fix in detail, but their data team pointed at query history as the missing ingredient.
It's worth pausing on the number. Two out of three agent-generated joins on a real production warehouse are wrong. That means any AI-powered analytics product shipping today without grounding is shipping broken — and most of them are.
The "8x" in the headline refers to a separate problem: when an agent doesn't know which table to hit, it pulls from the most permissive source available — usually the lakehouse. Lakehouse queries without proper predicates scan massive files and run on much more expensive compute than warehouse queries with clean joins. The cumulative cost across thousands of agent-driven queries is the kind of line item that gets a CFO's attention within a quarter.
This is the routing gap DataHub's Context Intelligence is targeting. By showing the agent which tables are typically joined, in which order, and with which filter patterns, the system reduces the chance an agent falls back to "scan everything" mode.
I've been saying this since 2024: a 1M-context model with zero context is still dumber than a 200K-context model with the right one. The DataHub launch is the first productized admission that static schemas are dead for agent work. If you're building analytics AI in 2026, your agent's "memory" needs to be the actual query log of the humans who knew what they were doing. Anything less is hallucination as a service.
The hard truth: most companies don't have a query log. They have a query relic — a partial warehouse query_history table from Snowflake, a dbt run log, a Looker audit table, and a bunch of tribal knowledge in senior analysts' heads. DataHub, Atlan, and the rest of the metadata vendors are going to make a fortune stitching these together over the next 18 months.
AI agents hallucinating joins at a 65% clip is a 2026 baseline, not an edge case. DataHub's Context Intelligence, mining your SQL query history, is the first real product to fix it — and the lakehouse routing gap it closes is worth 8x in compute spend.