← Back to Payloads
ai2026-05-28

Zuckerbergs lab just had a big AI biology moment

Biohub released a fully open 'world model' of protein biology — ESMC, ESMFold2, and ESM Atlas — covering 6.8 billion sequences and 1.1 billion predicted structures. The hit rate on cancer and immunology binder design is 36–88%, with PD-L1 designs already restoring T-cell signaling in lab tests.
Quick Access
Install command
$ mrt install ai
Browse related skills
Zuckerbergs lab just had a big AI biology moment

Zuckerberg's lab just had a big AI biology moment

The Chan Zuckerberg Biohub released its most ambitious open-science drop yet on May 27, 2026: a "world model" of protein biology that maps 6.8 billion sequences, predicts 1.1 billion structures, and — most importantly — designs new functional proteins in days rather than years. If you only follow this space for the LLM news, the framing will sound familiar. The biology is a different story.

What You Need to Know: Biohub open-sourced ESMC (a 2.8-billion-sequence language model), ESMFold2 (a structure-prediction engine that beats AlphaFold 3 on antibody-antigen complexes), and ESM Atlas (a navigable map of 6.8B sequences and 1.1B structures). Researchers used the system to design protein binders against five cancer/immunology targets in days; lab-validated hit rates were 36–88% for compact minibinders. PD-L1 designs restored T-cell signaling in the same pathway that approved checkpoint therapies target.

Why It Matters

  • This is what an open release looks like at frontier scale. ESMC, ESMFold2, and ESM Atlas are freely available to researchers worldwide — no API gate, no per-token charge, no enterprise tier.
  • Antibody-antigen design just got a 10x compression in the preclinical timeline. A typical preclinical binder candidate takes three to four years. ESMFold2 takes days, and the binders work in lab tests.
  • AlphaFold 3 isn't the state of the art on antibody-antigen binding anymore. ESMFold2 beats it from representations alone, and matches or beats it with MSA, according to Biohub's published benchmarks.
  • The structural diversity argument matters. The designed PD-L1, EGFR, PDGFRβ, CTLA-4, and CD45 binders showed minimal similarity to sequences in public databases, which means the model is producing de novo solutions, not retrieving known binders.
  • For AI/ML builders, this is also a case study in how language-model training objectives transfer to non-text domains. ESMC was trained on the same "predict the next token" principle as an LLM — applied to amino acid sequences, it internalized the physical rules of protein folding.

What Actually Happened

The Three-Model Stack

Biohub's release is a coordinated three-part system, not a single model. ESMC is the language model at the core — trained on approximately 2.8 billion protein sequences drawn from across all of life. ESMFold2 takes ESMC's sequence representations and predicts atomically-resolved 3D structures of biomolecular complexes. ESM Atlas is the searchable database: 6.8 billion protein sequences and 1.1 billion predicted structures, organized by relationships the model has learned rather than by traditional sequence-similarity metrics. The whole stack is available on the Biohub Platform. (biohub.org)

The Therapeutic Results

The most concrete validation came from a binder-design study described in a Biohub preprint. Researchers used ESMFold2 to design protein binders against five targets central to cancer and immunology: EGFR and PDGFRβ (tumor growth), PD-L1 and CTLA-4 (immune checkpoints), and CD45 (immune cell signaling). The computational search completed in days. Lab-validated hit rates: 36–88% for compact minibinders, 15–29% for antibody-derived formats, with confirmed binding. For PD-L1 specifically, the designed binders restored T-cell signaling in laboratory tests, blocking the same pathway that approved checkpoint therapies (Keytruda, Opdivo) target. (biohub.org)

How It Beats AlphaFold 3

Biohub's published benchmark data shows ESMFold2 is more successful than AlphaFold 3 at predicting the true binding pose of antibody-antigen complexes from ESMC representations alone. When given the same evolutionary information (multiple sequence alignments) as AlphaFold, ESMFold2 is the strongest predictor on both general protein-protein interaction and antibody-antigen benchmarks. The team also showed that ESMFold2 consistently improves with more compute — letting the model make multiple predictions and scoring them by its own confidence estimates. This is a meaningful shift: AlphaFold 3 is no longer the unambiguous leader on the binding task that matters most for antibody drugs. (biohub.org)

The People and the Money

The model team is led by Alex Rives, head of science at Biohub, who co-founded the organization with Priscilla Chan and Mark Zuckerberg in 2016. The Biohub has been a 501(c)(3) since its founding and currently funds a mix of resident scientists, affiliate investigators, and infrastructure. Priscilla Chan's framing on the open release: "Biohub was built on the belief that open science accelerates discovery. Making these tools freely available means researchers everywhere can move faster toward personalized cures that work for individual patients, because they target the specific biology driving their disease." The release is structured so that a researcher with no Biohub affiliation can download the model weights, run them on their own hardware, and use them for both fundamental biology and commercial therapeutic development. (biohub.org)

What the World Model Actually Is

The "world model" framing isn't marketing fluff. The training objective for ESMC was simple: predict the amino acids that evolution selects. Because evolution tends to preserve proteins that are fit for purpose, the patterns preserved across billions of years of data implicitly encode the physical rules governing protein function. From that training, a world model emerges — one that has internalized those rules deeply enough to generate functional proteins from scratch. This is the same paradigm shift that made large language models possible: scale plus the right training objective plus enough compute produces emergent capability. The capability here is biology, not text. (biohub.org)

The Take

Most AI-for-biology stories are incremental — better structure prediction here, faster variant screening there. This release is structural. Three things make it different. First, the binder-design results are not in silico; the proteins were tested in the lab and they bound the targets they were designed for. Second, the open release means any academic lab or biotech can run this on their own infrastructure, which removes the per-query API tax that limits access to commercial biology AI. Third, the antibody-antigen benchmark is the single most commercially important problem in therapeutic protein design, and ESMFold2 just took the lead on it. For builders outside biology, the transferable lesson is the architecture: a foundation model trained on raw sequence data, a structure-prediction head built on top of it, and a navigable database of the learned representations. The same three-layer stack will be the template for the next wave of domain-specific foundation models in materials, chemistry, and structural engineering.

Quick Summary

Biohub open-sourced a three-model protein-biology stack — ESMC, ESMFold2, and ESM Atlas — that beat AlphaFold 3 on antibody-antigen binding and designed working protein binders against five cancer/immunology targets in days. The 36–88% lab-validated hit rate and the PD-L1 T-cell signaling restoration are the proof points. Fully open, fully downloadable, fully usable for commercial therapeutic development.

Sources

Related Dispatches