
**TL;DR:** RAG-Architect treats retrieval as an engineering problem, not a magic embedding box. Chunk size, metadata taxonomy, hybrid search weighting, reranking — these decisions determine whether your RAG system answers questions correctly or just confidently. Get them wrong and you'll spend months wondering why your AI "doesn't know" things that are clearly in the documents.
The average RAG demo works beautifully. The average production RAG system is a quiet disaster — retrieving semantically similar but contextually wrong documents, losing critical information to aggressive chunking, and silently hallucinating answers that sound authoritative because the retrieval step failed invisibly. RAG-Architect makes the invisible visible.
Use rag-architect to audit our current knowledge base. Analyze the document corpus in ./docs: assess chunk boundaries, identify metadata gaps, and score retrieval quality for our top-20 most common query types. Produce a chunking strategy recommendation and a retrieval benchmark baseline.
retrieval:
dense:
provider: pinecone
model: text-embedding-3-large
dimension: 3072
sparse:
type: bm25
k1: 1.5
b: 0.75
fusion:
method: rrf
k: 60
rerank:
model: cross-encoder/ms-marco-MiniLM-L-12v2
top_k: 20
final_k: 5
freshness:
decay: logarithmic
half_life_days: 90
rag-architect eval --config ./rag-config.yaml \
--queries ./test-queries.jsonl \
--metrics hit_rate,mrr,answer_faithfulness
The evaluation output includes per-query retrieval traces — you can see exactly which documents were retrieved, their fusion scores, and whether the final answer was faithful to them.
| **Pros** | **Cons** |
|---|---|
| End-to-end retrieval benchmarking — no more flying blind | Requires labeled evaluation queries — hard to benchmark without ground truth |
| Hybrid search outperforms pure vector search on most domains | Reranking adds latency (typically 50-200ms per query) |
|---|
| Chunking strategy framework prevents "everything at 512 tokens" | Freshness scoring needs reliable document timestamps |
|---|
| Composable with any vector DB | Cross-encoder reranking requires a hosted model endpoint |
|---|
| Detects hallucination risk from retrieval failure | Chunking optimization is domain-specific |
|---|