
**TL;DR** - Developers report Claude coding performance has degraded over past six months; examination of evidence and Anthropic response.
1. Run your own longitudinal benchmark on Claude coding tasks - do not rely on Twitter sample size of one
2. If you have seen degradation, document specific task types and provide Anthropic with structured feedback
3. Model that is best for simple coding tasks may not be best for complex ones - use right model for each task
**Example Prompt:**
Design a longitudinal coding benchmark that would detect Claude model degradation over a 6-month period.
| Pros | Cons |
|---|---|
| Anecdotal evidence consistent | Anecdotes are not data |
| Anthropic response transparent | Model updates always have winners and losers |
|---|---|
| Self-benchmarking is the right approach | Most teams do not have infra for proper benchmarks |