← Back to Payloads
AI Engineering2026-05-13

Conventional RAG Redoes the Same Work Every Agent Session. IBM and ServiceNow Just Made It Worse.

VentureBeat's Data Infrastructure Weekly for May 13, 2026 covers the problem every agent builder has hit: conventional RAG re-derives the same context every session, eating compute and slowing response. 85% of enterprises are running agentic AI on the wrong data foundation. IBM and ServiceNow just made a deal that will lock more of them in. Here is the breakdown.
Quick Access
Install command
$ mrt install rag
Browse related skills
Conventional RAG Redoes the Same Work Every Agent Session. IBM and ServiceNow Just Made It Worse.

Hey guys, Mr. Technology here.

The May 13, 2026 issue of VentureBeat's Data Infrastructure Weekly put a number on a problem every agent builder has been quietly complaining about: 85% of enterprises are running agentic AI on the wrong data foundation. The number comes from a survey cited by VentureBeat, and it lines up with what I have been hearing from data infrastructure teams for six months. The same week, IBM and ServiceNow announced an expanded agentic AI partnership that will ship to thousands of enterprise customers, locking in the conventional RAG architecture for another product cycle. Both stories are about the same problem: the data layer underneath agentic AI is the bottleneck, and the industry is shipping the wrong fix.

The RAG Problem Nobody Wants to Fix

Conventional Retrieval-Augmented Generation (RAG) works like this: when an agent gets a question, the system retrieves relevant documents from a vector store, prepends them to the prompt, and generates an answer. The retrieved context is per-query. The agent's accumulated understanding of the user's preferences, prior interactions, and previous reasoning is not carried across sessions. Every new session starts from zero.

For a chatbot, this is mostly fine. The user asks a question, gets an answer, and moves on. The cost of re-deriving context is low.

For an agent, this is catastrophic. The agent is doing a long-running task — debugging a system, building a feature, processing a queue of support tickets — and every session boundary is a forced re-derivation of the context the agent already had. The agent loses its place. The model re-reads the same documents. The compute bill is multiplied. The latency is added on every handoff.

The number that quantifies this: per the VentureBeat piece, a recent evaluation showed that an "observational memory" approach — storing the agent's observations and retrieving them across sessions — cuts AI agent costs by 10x and outscores RAG on long-context benchmarks. The 10x cost reduction is the headline. The outperformance on long-context is the more important finding: it means the conventional RAG architecture is not just expensive, it is also leaving accuracy on the table.

This is the "85% on the wrong data foundation" number. The 85% is enterprises who built their agentic AI data layer as a conventional RAG pipeline and are now discovering that the architecture does not scale to the actual use case.

IBM and ServiceNow: The Partnership That Will Lock In the Wrong Pattern

IBM and ServiceNow announced an expanded enterprise AI alliance on June 11, 2026, tying IBM's watsonx, automation, data, and consulting to ServiceNow's AI platform. The CIO Dive coverage frames it as a play to "update legacy systems." The reality is more specific: IBM is putting watsonx.data — a hybrid, open data lakehouse — in front of ServiceNow's agentic AI runtime, with a particular emphasis on governed enterprise data.

IBM Think 2026 (May 5-7, 2026) previewed the architecture: next-generation agent orchestration, governed data access via watsonx.data, and a consulting arm that helps enterprises wire it all up. The pitch to enterprise buyers is: "you can run agentic AI on your existing data, with the governance you need, on the platform you already trust."

The pitch is technically defensible. The architecture is conventional. The RAG layer in the IBM-ServiceNow stack is, as far as I can tell from the public materials, a standard retrieval pipeline: query, embed, retrieve, prepend, generate. No observational memory. No cross-session state. The same pattern that 85% of enterprises are already on, just with better governance and tighter integration.

The problem is that the conventional pattern is the bottleneck. Adding governance to the bottleneck does not fix the bottleneck. It gives the bottleneck a compliance certification.

What the Right Fix Looks Like

The pattern that the 15% are moving to is context architecture — a term VentureBeat also used in their May 18 piece on the same topic. The components:

Observational memory. The agent's observations during a session are stored as structured artifacts — what the user asked, what the model did, what tools it used, what failed, what worked. The next session starts with the observational memory as pre-context, not from zero. The cost reduction comes from not re-deriving what the model already learned.

Context windows designed for the use case, not the demo. Most RAG systems are designed for a single query. Agent contexts are long-running, with the agent reading, writing, and reasoning over minutes or hours. The context window needs to be designed for that — chunked, compressed, summarized — not just expanded to a million tokens and forgotten.

Tool result persistence. When an agent calls a tool and gets a result, the result is often forgotten by the next session. The fix: persist tool results as part of the agent's state, with retrieval over the persisted results when the next session needs them.

Cross-session knowledge graphs. The agent's accumulated understanding of the user's domain — the entities, the relationships, the workflows — is a knowledge graph, not a vector store. Knowledge graphs are how human experts carry context across months of work. They are how agents should too.

Semantic caching. The same query, asked twice, should hit a cache, not a model. The cache needs to be semantic, not just exact-match, so the model is not re-running on slight variations of the same question.

The vendors moving on this: Google's open-source Always On Memory Agent (March 2026), Letta, Mem0, Zep, Cognee, and a dozen newer entrants. The pattern is converging around the same architecture: persistent state, observational memory, cross-session retrieval.

The Take

Three things to act on this week.

If you are an enterprise architect: the IBM-ServiceNow partnership is real and it is shipping. If your data foundation strategy is "wait for the vendor to figure it out," the vendor is going to ship the conventional RAG pattern with a governance wrapper. That is not the answer to the 85% number. You need to plan for the next architecture now: persistent state, observational memory, and the vendor's eventual migration to it.

If you are an agent builder: the conventional RAG pattern is fine for chatbots and bad for agents. If your agent is doing long-running work, you are paying 10x more than you need to and getting worse results. The fix is not "more retrieval" — it is "remember what happened." Add observational memory, add tool result persistence, and add a knowledge graph. The 10x cost reduction is real and you can ship the fix in a sprint.

If you are a data infrastructure vendor: the conventional RAG layer is the layer you are about to be displaced on. The vendors who ship context architecture — observational memory, semantic caching, cross-session state — are the ones who will own the next product cycle. The vendors who keep shipping "faster vector store" are racing to the bottom on a commoditizing layer. The differentiation is moving up the stack.

The conventional RAG pattern is the new COBOL. It works. It is everywhere. It is going to be replaced by something that was designed for the actual workload, and the replacement cycle is starting now.

Mr. Technology


Sources: VentureBeat — 'Observational memory' cuts AI agent costs 10x and outscores RAG on long contexts, VentureBeat — Context architecture is replacing RAG as agentic AI pushes enterprise retrieval to its limits, VentureBeat — Google PM open-sources Always On Memory Agent, ditching vector databases, VentureBeat — Databricks says it solved the decades-old data pipeline problem, IBM Newsroom — Think 2026: IBM Delivers the Blueprint for the AI Operating Model, CIO Dive — ServiceNow, IBM team up to target legacy IT, WindowsForum — IBM x ServiceNow Agentic AI, Alation — Why Enterprise AI Projects Fail: 6 Root Causes and Fixes, McKinsey — State of AI trust in 2026: Shifting to the agentic era.

Related Dispatches