← Back to Payloads
ai2026-06-02

Pinterest cut its AI bill 90 Heres the part it ripped out

Pinterest CTO Matt Madrigal cut the company's AI bill 90% and improved accuracy 30% by ripping Qwen3-VL's vision encoder out and replacing it with proprietary embeddings. The 20x latency win is the real story. Inside Uber and Salesforce, the active argument is whether the falling model-cost curve justifies shipping free-tier AI now.
Quick Access
Install command
$ mrt install ai
Browse related skills
Pinterest cut its AI bill 90 Heres the part it ripped out

Pinterest cut its AI bill 90 Heres the part it ripped out

Pinterest serves 620 million monthly users on an AI stack that used to cost frontier-model money. CTO Matt Madrigal cut the bill by 90% and improved accuracy by 30% by doing one specific thing: ripping out the vision encoder from Alibaba's Qwen3-VL and replacing it with proprietary embeddings trained on Pinterest's own data. The lessons are sharper than "use open source" — they're about which part of the model to keep, which to throw away, and when the smart move is to gut the thing you bought. Meanwhile, Uber and Salesforce are arguing loudly about whether model costs are coming down fast enough to keep AI features in production.

What You Need to Know: Pinterest CTO Matt Madrigal told VentureBeat that his team cut Pinterest's AI costs 90% and improved model accuracy 30% by gutting Qwen3-VL's vision encoder and rebuilding it with proprietary multimodal embeddings. The change also cut inference latency by roughly 20x. The work was done on the consumer shopping assistant "Navigator 1" and a separate vision classifier that detects AI-generated images. Inside Uber and Salesforce, the live argument is whether the falling cost of inference is fast enough to keep AI features in production for free or near-free users.

Why It Matters

  • For ML engineers: "Use open source" is a slogan. The Pinterest play is "use open weights, but rip out the layer where your proprietary data has the most leverage." Vision encoders are exactly that layer.
  • For product teams: 20x latency reduction isn't a backend improvement — it's the difference between a feature that feels real-time and one that doesn't ship.
  • For cost-conscious companies: Frontier-model APIs are a starting point, not a steady state. If you have unique data, the right answer is almost always fine-tune the open version, then gut the parts the open model does worst.
  • For AI strategy: Pinterest rebuilt their own vision classifier to detect AI-generated images and now labels 4x more AI content than before. The defensive AI use case is just as important as the offensive one.
  • For platform teams: The Uber/Salesforce debate is the one your CFO will hear about. Be ready with a model-cost glide path, not a single quote.

What Actually Happened

How Pinterest cut AI costs 90% by gutting Qwen3-VL's vision layer

In a VentureBeat exclusive on May 29, 2026, Pinterest CTO Matt Madrigal walked through the architectural change behind the headline 90% cost reduction. The work targeted Pinterest's conversational shopping assistant Navigator 1, originally built on Alibaba's Qwen3-VL. Madrigal's team "ripped out" Qwen3-VL's vision encoder layer and fine-tuned the model on Pinterest's own proprietary multimodal embeddings. Two wins fell out of that: cost went down 90%, and accuracy went up 30%. The model also got faster — Madrigal told VB that without the proprietary embeddings, devs would have to call and encode each image at runtime, "one at a time," producing latency "20 times worse" from an inference perspective.

Madrigal's framing for why this worked: "If you've got really unique data that you can then fine-tune an open source model with, data quality will, frankly, outweigh or overcome model size." The change is part of a longer arc at Pinterest — the team has been customizing open-source models "foundationally in-house" since the BERT and CLIP days. Their custom Pin CLIP is a fine-tune of OpenAI's CLIP with proprietary visual embeddings. The Apache-licensed open-weight ecosystem, in Madrigal's words, is "where we've found open source to be so powerful for us." The full VB piece is here; eMarketer's coverage adds the 30% accuracy figure.

The taste graph, the AI-content detector, and the broader Pinterest AI strategy

The 90% number doesn't exist in isolation. Pinterest also built a "taste graph" — a dynamic representation of billions of users' evolving preferences, combining user embeddings with a graph structure for lateral exploration. Madrigal described it as "not a social graph" but "much more of a preference graph: What's going to inspire you? What are you trying to do next?" The same team built a proprietary vision classifier to detect AI-generated images, which now labels 4x more AI content than Pinterest's previous model. That's a defensive AI feature that also happens to be good PR. The combination — open weights, proprietary data, custom encoders, in-house classifiers — is the playbook, not the cost cut. Madrigal's rule of thumb: "If it's something that's going to be critical for our end users, that's going to drive engagement, that will have to scale to over 600 million monthly active users, we're going to either probably build it or we're going to leverage open source and customize the heck out of it."

The Uber and Salesforce argument playing out internally

VentureBeat's broader reporting (and the TLDR framing in the original digest) flags an active internal debate at Uber and Salesforce about whether model costs are falling fast enough to keep AI features in production. The honest version: yes, frontier-model token prices are still falling year-over-year, and yes, the gap between a frontier API call and a fine-tuned open model on owned infrastructure is widening in the open-model's favor. But the unit economics of a free-tier AI feature still depend on the use case. Salesforce has been pushing a per-seat Copilot add-on rather than absorbing the inference cost. Uber's consumer AI features are mostly mediated through already-paid channels (driver support, dispatch). The argument inside both companies, per multiple reports, is whether the long-run cost glide path justifies shipping free-tier AI now — and the answer keeps being "yes, but only if you build a real evaluation loop so you can swap models without breaking the feature." That's the strategic version of what Pinterest just did with Qwen3-VL.


The Take

The Pinterest play is the most important cost-engineering story of 2026 so far, and most of the "use open source" coverage is missing the actual lesson. The lesson isn't "open source is cheaper." The lesson is gut the layer where your proprietary data has the most leverage, keep everything else. For Pinterest, that's the vision encoder. For a code-search company, it's the embedding model. For a financial-data startup, it's the table-aware retriever. The pattern is the same; the layer is different.

The 30% accuracy improvement is the part that should make CFOs uncomfortable with their current AI spend. A frontier-model API call is not the best version of any specific application — it's the best general version. The instant your domain has unique data, the curve inverts. Pinterest's data is uniquely good (a billion labeled "saved" pins, a continuous stream of taste-graph updates), and they used it to beat a frontier model at 10% of the cost.

The Uber/Salesforce argument is the one your finance team will hear about, and the right answer for most companies is boring: build a thin abstraction layer over your model provider, instrument token spend per feature, and have a fallback model and an evaluation harness ready before you ship. The cost curve is going down, but it's not going down uniformly, and the features that survive 2026 will be the ones that can survive a 10x cost spike without breaking the unit economics. Pinterest's Qwen3-VL gut job is exactly that kind of hedge, built in advance.


Quick Summary

Pinterest's CTO Matt Madrigal cut the company's AI bill 90% and improved accuracy 30% by ripping Qwen3-VL's vision encoder out and replacing it with proprietary embeddings — latency dropped 20x in the process. The pattern (open weights, proprietary data, custom encoders) is the playbook, not just the cost cut. Inside Uber and Salesforce, the live argument is whether the model-cost glide path justifies shipping free-tier AI today; the right answer is to build for swap-in-place.


Sources:

Source: VentureBeat | mr.technology — The Master Skill Index

Related Dispatches