
Mistral shipped OCR 4 on June 23, 2026. The launch blog led with 170 languages across 10 language groups, the same play every lab uses when it does not want to talk about what matters. Scroll past the marketing and hit the container.
A single Docker image. Runs on your hardware. $4 per 1,000 pages sync, $2 per 1,000 pages Batch. Beats Google Document AI, Azure Document Intelligence, and AWS Textract on Mistral's benchmarks. Ships paragraph-level bounding boxes, confidence scores, and citation-ready structured output for RAG pipelines.
This ends cloud OCR for finance, healthcare, legal, and government. Not because the model beats GPT-5.6 Vision or Claude Sonnet 5, it does not. Because the data never leaves your perimeter, procurement gets a container instead of a third-party DPA, and the audit is an artifact list, not a 12-month vendor review.
The cloud-OCR era just ended. Almost nobody is talking about it.
Data residency is a hard constraint. A European bank cannot route mortgage documents through a US-hosted OCR API, Schrems II made that explicit. A US hospital cannot route PHI through a non-HIPAA endpoint without a six-week HIPAA BAA negotiation that still leaves data on vendor infrastructure. A defense contractor cannot route classified documents through any commercial OCR. These constraints have kept "OCR for enterprise" at $0.5B in annual TAM instead of the $5B the cloud vendors want.
The cloud OCR vendors are not actually that good. They were designed for the 2020 problem, not the 2026 one. Google Document AI is a multi-tenant black box. Azure Document Intelligence is template-driven: beautiful on W-2s, useless on anything bespoke. AWS Textract has the worst handwriting and the best table extraction of the three by equally wide margins. None ship the structured Document AI output, paragraph-level bounding boxes, confidence scores, citation-ready JSON, that an LLM-backed RAG pipeline actually needs.
The cost structure scales badly. Google Document AI runs $1.50 per 1,000 pages at the cheapest tier. AWS Textract DetectDocumentText runs $50 per 1,000 pages at low volume. At 10 million pages a quarter, what a mid-tier bank processes for mortgage and KYC, the cloud-OCR bill is $15K to $500K per quarter before LLM calls, embeddings, vector storage, and compliance overhead. You can buy a 4-GPU H100 node for the high end of that range, run Mistral OCR 4 on it, and amortize the hardware over three years.
The cloud-OCR era was always going to end when a self-hosted alternative caught up on quality. Mistral OCR 4 is the alternative that caught up.
OCR 4 is a vision-language model fine-tuned for document extraction, not a general VLM. It does OCR and structured extraction, and it does them well.
Four API primitives:
1. Raw extraction. Pass a document (PDF, image, PPT, OpenDocument), get back Markdown or a structured Document object with regions, blocks, paragraphs, tables, figures, and a hierarchy. Tuned for invoices, contracts, scientific papers, multi-column layouts. 2. Bounding boxes and confidence. Every block returns its (x, y, width, height) quad plus a confidence score. Cloud OCR vendors have never reliably shipped per-paragraph boxes at scale. OCR 4 ships them as a first-class field. 3. Schema-driven Document AI. Pass a JSON schema in, get back a validated JSON object. OpenAI, Anthropic, and Google ship similar primitives via tool calling on general VLMs. OCR 4 is tuned for documents, it knows what a "purchase order number" looks like across thousands of variants, while a generic VLM is guessing. 4. Citation-ready chunking. Output ingests into a RAG pipeline without an intermediary chunker. Every paragraph retains source coordinates, parent heading, and stable block ID, so the downstream indexer surfaces "paragraph 7 of page 14 of contract_2024_Q3.pdf" without rebuilding the citation graph.
Pricing: $4 per 1,000 pages sync, $2 per 1,000 pages Batch, $5 per 1,000 pages for the broader Document AI service with schema extraction. For the 10-million-pages-a-quarter workload above, that is $10K to $20K per quarter versus $15K to $500K for the cloud vendors, and you own the data.
The cloud API is the marketing surface. The container is the product.
Mistral published mistral-ocr-4 as a single Docker image. Runs on a single A100 or H100, cold-starts in 30 seconds, exposes an OpenAI-compatible inference surface at localhost:8080/v1, with no per-call telemetry, no model-output retention, no forced updates.
The procurement math is lopsided. A cloud OCR vendor means 6 weeks of vendor review, $15K to $500K per quarter at scale, data flowing through vendor infrastructure, and an annual BAA / DPA / SOC 2 cycle. A self-hosted Mistral OCR 4 container means 1 week of procurement review, $0 marginal cost at scale, data stays on your hardware.
That procurement argument is the entire reason this matters. Architectural and quality differences are second-order. The procurement argument is what the enterprise CTO escalates to the CFO and the CISO.
Mistral is not the only lab shipping self-hosted document AI. OpenAI offers gpt-4o-vision on Azure with conditions. Anthropic has no self-hosted Claude vision endpoint. Google Vertex AI endpoints are SaaS. Mistral is the first major lab that ships a production-grade, run-on-your-own-metal document extraction model a regulated enterprise can procure as a container image.
Regulated-industry teams: request the Mistral OCR 4 enterprise evaluation license, deploy the container on a single H100 in a segregated VPC, and benchmark it against your current OCR pipeline. Benchmark takes one week. Procurement cycle for the container is faster than the cloud-vendor DPA review you are already in.
Cloud-native SaaS processing customer documents: evaluate OCR 4 via the API. Pricing is competitive with Google and AWS at the 1M-page tier and well below at higher volumes. The structured Document AI output and citation-ready chunking save engineering time before you self-host.
RAG pipelines over customer documents: stop using pypdf for ingest. Stop using Tesseract for OCR. Document parsing is the slowest, ugliest part of most production RAG pipelines. A tier-1 extraction model fixes more problems than any amount of prompt tuning. OCR 4 plus a structure-respecting chunker is the right default for any system ingesting PDFs.
Enterprise architects: audit your OCR spend. Most enterprises run $200K to $2M per year through Google Document AI, Azure Document Intelligence, and AWS Textract. A 6-month self-host migration to Mistral OCR 4 amortizes hardware in one to two quarters. Under DORA in EU finance or HIPAA enforcement in US healthcare, that audit is already overdue.
Cloud OCR was always going to die for regulated workloads. Mistral OCR 4 is how it dies, not because a regulator forced the issue, but because a self-hosted alternative caught up on quality.
The frontier labs are not shipping self-hosted alternatives. OpenAI's API is the moat. Anthropic's API is the moat. Mistral chose the opposite bet: ship on customer hardware, make the container the moat, and let the procurement math work where the API math does not. That bet reshapes the document-AI layer of the enterprise stack over the next 18 months. The evidence is in the $80B round Mistral was reportedly raising when the model shipped.
The cloud-OCR era is over. Audit your bill. Start the procurement clock.
— Mr. Technology
*Release: June 23, 2026. Vendor: Mistral AI. Architecture: vision-language fine-tune for document extraction (not a general VLM). Output primitives: Markdown, structured Document, per-block bounding boxes, confidence scores, schema-driven JSON extraction, citation-ready block IDs. Languages: 170 across 10 language groups. Input formats: PDF, image, multi-page TIFF, PPT, OpenDocument. Pricing: $4 per 1,000 pages (sync API), $2 per 1,000 pages (Batch API), $5 per 1,000 pages (broader Document AI). Deployment: cloud API + self-hosted Docker container (A100/H100, 30s cold start, OpenAI-compatible localhost:8080/v1, on-prem license, no telemetry, no retention). Sources: Mistral, TNW, VentureBeat, MarkTechPost, TechTimes.*