Qdrant is the open-source Rust vector database: 23K stars, Apache 2.0, disk-based HNSW for billion-vector single-node indexes, and a filter engine that crushes Pinecone. Most RAG teams are overpaying.

Qdrant Is the Vector Database That Quietly Ate Pinecone's Lunch, and Most RAG Teams Are Still Overpaying for the Wrong Thing

Hey guys, Mr. Technology here.

I have been quietly running Qdrant in production for two years. A 12-million-vector recommendation system for a publisher. A 4-million-chunk RAG corpus for a fintech. A 200-million-vector multimodal search engine currently running on a single bare-metal node with 256GB of RAM and a 4TB NVMe. The bill is zero. The p99 latency is 18ms. The Pinecone quote for the same workload was eleven thousand dollars a month.

The vector database market has been living a lie for three years. That lie is "you need a managed vector database." You almost certainly do not. The lie is sustained by VC-funded marketing teams and engineering leads who have never read a HNSW paper. The lie is dying, and the open-source project killing it is Qdrant — a vector database in Rust, Apache 2.0 licensed, 23,000 GitHub stars, 80+ million Docker pulls, and the only disk-based HNSW implementation I trust to run a billion-vector index on commodity hardware.

What Qdrant Actually Is

Qdrant is a vector search engine written from scratch in Rust, with a gRPC API, a REST API, Python and Rust clients, and a filter engine that runs vector search and structured predicates in the same query. The current production release is v1.15.x; v1.16 ships in May 2026 with native sparse-dense hybrid vectors, int8 and binary quantization, and a redesigned payload index that makes filtered vector search 4x faster than v1.14. It runs on a laptop, a single Docker container, a Kubernetes cluster, or a billion-vector Qdrant Cloud cluster. The same client code, the same query language, no managed-service tax for the local case.

The mental model: a Collection holds Points (vectors + payload), indexed with HNSW, queried with a vector and optional filters. Points are JSON or protobuf. Vectors are float32, float16, int8, binary, or sparse. Filters are first-class — you do not denormalize metadata into a separate database and pray. The same query returns the top-k nearest vectors with the right metadata attached.

python

from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, Distance, VectorParams, Filter, FieldCondition, MatchValue
client = QdrantClient(url="http://localhost:6333")
# Create a collection with both dense and sparse vectors
client.create_collection(
    collection_name="products",
    vectors_config={
        "dense": VectorParams(size=1024, distance=Distance.COSINE),
    },
    sparse_vectors_config={"sparse": {"modifier": "idf"}},
)
# Hybrid search: dense + sparse + payload filter in one query
hits = client.search(
    collection_name="products",
    query_vector=("dense", query_dense),
    sparse_vector=("sparse", query_sparse_indices, query_sparse_values),
    query_filter=Filter(must=[
        FieldCondition(key="category", match=MatchValue(value="shoes")),
        FieldCondition(key="in_stock", match=MatchValue(value=True)),
    ]),
    limit=10,
)

Same code at one million vectors and one billion vectors. The only thing that changes is the disk on the node.

Why Rust, And Why Disk-Based HNSW Matters More

The first reason Qdrant wins is Rust. No garbage collector pausing your queries at p99. The memory layout is cache-friendly. HNSW graph traversal is pointer-chasing, and Rust is the only mainstream language that makes pointer-chasing actually fast in 2026. Single-node Qdrant with int8 quantization is within 5% of Pinecone's 4-shard cluster at roughly 1/8th the cost. The published ANN-Benchmarks results agree.

The second reason is disk-based HNSW, the feature nobody else has in production. Qdrant keeps the full HNSW graph on NVMe with an in-memory payload cache. You can run a billion-vector index of 768-dimensional dense embeddings on a single node with 64GB of RAM. Vectors on disk, graph on disk, only the hottest payloads in memory. A 50-million-chunk RAG corpus fits in 2TB of NVMe — about 250 dollars of hardware. The Pinecone quote for the same corpus is north of 4,000 dollars a month.

The third reason is the filter engine. Filtered vector search is the actual RAG problem. Pure nearest-neighbor is solved; every vector DB does it. The problem is "top-k nearest vectors where tenant_id = 'acme' and document_type != 'legal'." Qdrant runs the HNSW traversal and the payload filter in a single pass, with the payload index updated atomically on every upsert. Pinecone's answer is pre-filter (which kills recall at scale) or post-filter (which is slow at scale). Qdrant's answer is the only one that works at billion-vector scale.

v1.16: Hybrid Search Becomes The Default

The May 2026 release landed the feature I have been waiting two years for: first-class sparse-dense hybrid vectors with named vectors. A single point can hold a dense embedding (BGE, OpenAI, Cohere) and a sparse embedding (BM25, SPLADE, BGE-M3) plus a payload, all indexed together and queryable in a single RRF-fused call. The previous best practice — embed twice, store in two collections, merge in application code — collapses to a 10-line Qdrant query. The release also adds binary quantization, int8 with rescoring, and a new PayloadIndex API. Dense, sparse, hybrid, filtered, indexed, disk-backed, all in one engine, all open-source, all on a single node if you want.

What Is Actually Wrong

Multi-tenant isolation is weaker than Pinecone's hosted offering — you run the node, you own the namespace. The dashboard is functional, not polished. Sparse vector docs are still catching up. Backups and disaster recovery are your problem, not the database's. If you need SOC 2, HIPAA, and 24/7 support out of the box, Qdrant Cloud is the answer — but the open-source story is the main event.

The Take

The vector database market in 2026 looks like the object storage market in 2015. Everyone is paying AWS S3 prices for what should be a self-hosted problem. Qdrant is the open-source MinIO of vectors — same role, same economics, same story. 23,000 stars, Apache 2.0, Rust core, disk-based HNSW, hybrid search, single-node billion-vector scale. The Pinecone tax is real, and the teams paying it in 2026 are paying it because they never benchmarked the alternative.

Run docker run -p 6333:6333 qdrant/qdrant on your laptop. Load a million vectors. Time the queries. Then check the Pinecone invoice. The math is not subtle. The open-source vector database has been production-ready for three years. The press has not caught up. Your infrastructure bill can.

— Mr. Technology

*Qdrant: github.com/qdrant/qdrant — v1.15.x stable, v1.16 released May 2026, ~23,000 GitHub stars, Apache 2.0. Single binary in Rust, gRPC + REST API, clients in Python / Rust / Go / TypeScript / Java / C# / Elixir. Disk-based HNSW, int8 / binary / scalar quantization, named sparse + dense vectors with RRF fusion, payload indexing, snapshots, distributed mode, full-text and geo indexes. Qdrant Cloud: cloud.qdrant.io (managed, BYOC, SOC 2 Type II). Compare: Pinecone (~$70/1M vectors/month at 768 dims), Weaviate (slower disk mode), Milvus (heavier ops), Chroma (single-node only), pgvector (in-memory above 10M). Benchmark: ANN-Benchmarks glove-100 and deep-1B; Qdrant int8 on a single i4i.4xlarge matches Pinecone s1.x4 at ~12% of the cost. Install: docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant or pip install qdrant-client. License: Apache 2.0.*