
The Chan Zuckerberg Biohub released its most ambitious open-science drop yet on May 27, 2026: a "world model" of protein biology that maps 6.8 billion sequences, predicts 1.1 billion structures, and — most importantly — designs new functional proteins in days rather than years. If you only follow this space for the LLM news, the framing will sound familiar. The biology is a different story.
What You Need to Know: Biohub open-sourced ESMC (a 2.8-billion-sequence language model), ESMFold2 (a structure-prediction engine that beats AlphaFold 3 on antibody-antigen complexes), and ESM Atlas (a navigable map of 6.8B sequences and 1.1B structures). Researchers used the system to design protein binders against five cancer/immunology targets in days; lab-validated hit rates were 36–88% for compact minibinders. PD-L1 designs restored T-cell signaling in the same pathway that approved checkpoint therapies target.
Biohub's release is a coordinated three-part system, not a single model. ESMC is the language model at the core — trained on approximately 2.8 billion protein sequences drawn from across all of life. ESMFold2 takes ESMC's sequence representations and predicts atomically-resolved 3D structures of biomolecular complexes. ESM Atlas is the searchable database: 6.8 billion protein sequences and 1.1 billion predicted structures, organized by relationships the model has learned rather than by traditional sequence-similarity metrics. The whole stack is available on the Biohub Platform. (biohub.org)
The most concrete validation came from a binder-design study described in a Biohub preprint. Researchers used ESMFold2 to design protein binders against five targets central to cancer and immunology: EGFR and PDGFRβ (tumor growth), PD-L1 and CTLA-4 (immune checkpoints), and CD45 (immune cell signaling). The computational search completed in days. Lab-validated hit rates: 36–88% for compact minibinders, 15–29% for antibody-derived formats, with confirmed binding. For PD-L1 specifically, the designed binders restored T-cell signaling in laboratory tests, blocking the same pathway that approved checkpoint therapies (Keytruda, Opdivo) target. (biohub.org)
Biohub's published benchmark data shows ESMFold2 is more successful than AlphaFold 3 at predicting the true binding pose of antibody-antigen complexes from ESMC representations alone. When given the same evolutionary information (multiple sequence alignments) as AlphaFold, ESMFold2 is the strongest predictor on both general protein-protein interaction and antibody-antigen benchmarks. The team also showed that ESMFold2 consistently improves with more compute — letting the model make multiple predictions and scoring them by its own confidence estimates. This is a meaningful shift: AlphaFold 3 is no longer the unambiguous leader on the binding task that matters most for antibody drugs. (biohub.org)
The model team is led by Alex Rives, head of science at Biohub, who co-founded the organization with Priscilla Chan and Mark Zuckerberg in 2016. The Biohub has been a 501(c)(3) since its founding and currently funds a mix of resident scientists, affiliate investigators, and infrastructure. Priscilla Chan's framing on the open release: "Biohub was built on the belief that open science accelerates discovery. Making these tools freely available means researchers everywhere can move faster toward personalized cures that work for individual patients, because they target the specific biology driving their disease." The release is structured so that a researcher with no Biohub affiliation can download the model weights, run them on their own hardware, and use them for both fundamental biology and commercial therapeutic development. (biohub.org)
The "world model" framing isn't marketing fluff. The training objective for ESMC was simple: predict the amino acids that evolution selects. Because evolution tends to preserve proteins that are fit for purpose, the patterns preserved across billions of years of data implicitly encode the physical rules governing protein function. From that training, a world model emerges — one that has internalized those rules deeply enough to generate functional proteins from scratch. This is the same paradigm shift that made large language models possible: scale plus the right training objective plus enough compute produces emergent capability. The capability here is biology, not text. (biohub.org)
Most AI-for-biology stories are incremental — better structure prediction here, faster variant screening there. This release is structural. Three things make it different. First, the binder-design results are not in silico; the proteins were tested in the lab and they bound the targets they were designed for. Second, the open release means any academic lab or biotech can run this on their own infrastructure, which removes the per-query API tax that limits access to commercial biology AI. Third, the antibody-antigen benchmark is the single most commercially important problem in therapeutic protein design, and ESMFold2 just took the lead on it. For builders outside biology, the transferable lesson is the architecture: a foundation model trained on raw sequence data, a structure-prediction head built on top of it, and a navigable database of the learned representations. The same three-layer stack will be the template for the next wave of domain-specific foundation models in materials, chemistry, and structural engineering.
Biohub open-sourced a three-model protein-biology stack — ESMC, ESMFold2, and ESM Atlas — that beat AlphaFold 3 on antibody-antigen binding and designed working protein binders against five cancer/immunology targets in days. The 36–88% lab-validated hit rate and the PD-L1 T-cell signaling restoration are the proof points. Fully open, fully downloadable, fully usable for commercial therapeutic development.