← Back to Payloads
ai2026-06-13

Product team scorecards , having a point of view , LLM evalu

Jenny Wanger argues there is no universal product team scorecard — use structured discussion prompts across four lenses instead. April Dunford and Wes Kao both make the case that the PM job in the AI era is to have and defend a point of view, not to maintain a backlog. And the PM's playbook for shipping LLM features requires a four-layer quality model with drift monitoring.
Quick Access
Install command
$ mrt install ai
Browse related skills
Product team scorecards , having a point of view , LLM evalu

Product team scorecards , having a point of view , LLM evalu

Hey guys, Mr. Technology here — product management scorecards are mostly a bad idea, "having a point of view" is the actual job, and LLM evaluation in production is finally getting a real playbook.

What You Need to Know: Jenny Wanger argues that product management resists simple metrics and the universal scorecard is a myth, so use structured discussion prompts across four lenses instead. April Dunford and Wes Kao are both making the case that the PM job in the AI era is to have and defend a point of view, not to prioritize a backlog. And the PM's playbook for shipping AI features that work in production is a four-layer quality model with A/B testing for nondeterministic outputs.

Story 1: There is no universal product team scorecard

In an essay that made the rounds in the Lenny's Newsletter / TLDR Product ecosystem this week, Jenny Wanger lays out the case that the question "how is this PM team doing?" is the wrong question, and the universal scorecard that CEOs keep asking for is a category error. Product management isn't a process role with measurable throughput; it's a judgment role whose outputs (good strategy, good taste, good trade-offs) are observable only in retrospect, after the decision has played out in the market.

Her fix is structured discussion prompts across four "lenses": outcomes (what did the team ship, and what changed for users because of it), judgment (how did the team decide what to ship, and what did they choose to kill), leverage (what is the team's cost-to-serve and how is it trending), and trajectory (is the team getting better at the previous three). The point is that the prompts are the same across teams, but the answers are team-specific, and the conversation is the value — not the dashboard.

This lines up with a related thread from Ravi Mehta, who argues in his piece on AI and product management that "AI doesn't make your job easier" — it just moves the bottleneck from production to judgment, and the teams that win are the ones that move from "prioritization" to "curation" as the cost of saying "yes" drops to zero. The throughline: the metrics that matter in product are the ones the team is willing to defend in a room, not the ones that look good in a board deck.

Story 2: "Having a point of view" is the actual PM job

Two pieces this week — April Dunford's "In the Age of AI, You Need a Point of View" and Wes Kao's "How to share your point of view" — make essentially the same argument from different angles. The product management job in 2026 is not to maintain a backlog, run a sprint, or write a PRD. It's to have a defensible point of view about what the future of the market looks like, what your company is uniquely positioned to do about it, and which bets are worth the next 18 months of engineering time.

Dunford's argument is from the buyer's side: in a market saturated with AI-infused products, technology buyers don't need another product vision — they need someone to tell them which version of the future is real and why this vendor's path through it is the credible one. Vendors who can't anchor their messaging in a clear perspective will get filtered out by buyers who have five other "AI for X" pitches to compare them against.

Kao's argument is from the PM's side: "Speaking up is a core way to add value, especially when you have close context on a problem." Her point is that PMs default to asking their manager what to do, when the higher-leverage move is to present a recommendation with supporting evidence and let the manager push back if they disagree. The PM who has a point of view and can defend it is the one who gets staffed on the strategic projects. The PM who only curates other people's points of view is the one who gets reassigned.

The synthesis: the product management job in the AI era is closer to "buy-side analyst" than "engineering manager." You're paid to have an opinion, defend it with evidence, and be wrong in specific ways that the team can learn from.

Story 3: LLM evaluation is finally getting a real playbook

A widely-shared O'Reilly Radar piece walks through the PM's playbook for shipping AI features that actually work in production. The argument is that the standard "ship a model, monitor accuracy" loop is too coarse for LLM-based products, where the output space is open-ended and the failure modes are silent. The replacement is a four-layer quality model that maps evaluation to user impact at each layer.

Layer 1 is input quality — are we sending the model the right context, the right retrieved documents, the right system prompt? Layer 2 is output quality — does the response meet the explicit and implicit quality bars for the use case? Layer 3 is outcome quality — did the user complete the task they set out to do? Layer 4 is business quality — did the interaction drive the metric the company cares about? Each layer has its own evaluation method, and you cannot skip from Layer 1 to Layer 4 without a measurement at every step in between.

The practical tools: A/B testing for nondeterministic outputs (which requires sample sizes 5–10x larger than deterministic-feature A/B tests because of the variance), latency management with fallback hierarchies (if the model times out at 800ms, fall back to a smaller model at 200ms), monitoring for model drift (the model's quality on the same input changes week over week as providers update weights), and a quality model that explicitly captures hallucination, refusal, off-policy, and unsafe-output rates. The PM who can hold all four layers in their head is the one who ships AI features that survive contact with real users.

The Take

The Wanger essay is the one to send to your CEO the next time they ask for a product team scorecard — because the answer is "here are the four questions we should be asking in our quarterly review, and here are the answers we gave this quarter." The Dunford-Kao thread is the one to send to any PM who is in a rut — because the rut is usually a symptom of not having a point of view, not a symptom of too much work. And the LLM evaluation playbook is the one to send to your platform team if you're shipping AI features into production in 2026, because "we'll just monitor accuracy" is the kind of answer that gets a feature killed six months after launch.

The throughline: the work is moving from process to judgment, from throughput to point of view, from "ship it" to "is it actually working in the way we said it would." The PMs who thrive are the ones who can do all three. The PMs who don't are going to find themselves reporting to one of the ones who does.

Quick Summary

Product team health doesn't fit on a dashboard — use structured discussion prompts across four lenses instead. The actual PM job in the AI era is to have and defend a point of view, not to maintain a backlog. And shipping LLM features that survive in production requires a four-layer quality model with A/B testing for nondeterministic outputs and drift monitoring.


Sources:

Related Dispatches