Anthropic published a Claude Code quality incident review on 4/23, publicly admitting that three overlapping engineering mistakes over the past nearly two months led to a decline in Claude Code usage quality. The company also said the impact extends to the Claude Agent SDK and Claude Cowork. The company said, “We place great importance on feedback about model degradation and have never intentionally reduced model capability,” and on 4/23 reset usage limits for all subscribers as compensation.
The timeline and technical root causes of the three bugs
Issue Active period Root cause Fix version Reasoning budget downgrade 3/4–4/7 reasoning effort The default was reduced from high to medium, making users feel the model became “dumber” 4/7 Rollback Cache cleaning bug 3/26–4/10 Idle sessions thinking cache cleared more than 1 hour after each iteration, rather than clearing only once v2.1.101 Concise prompt backfire 4/16–4/20 Added a system instruction: “tool-call interstitial text ≤25 characters,” and ablation testing showed an overall intelligence drop of 3% v2.1.116
Reasoning downgrade: the cost paid for reduced latency
On 3/4, Anthropic adjusted Claude Code’s default reasoning effort from high to medium, with the goal of shortening response latency. However, this change made the model feel “dumber” on code reasoning and debugging tasks. After the 4/7 rollback, Opus 4.7 now defaults to xhigh, while other models remain at high. The company acknowledged that before the change, internal evaluations failed to detect this degradation.
Cache cleaning bug: an implicit error across system boundaries
On 3/26, Anthropic introduced prompt caching optimizations for sessions idle for more than one hour. The original design was to “clear the thinking cache once after the session has been idle for an hour,” but in implementation it became “after idle is triggered, clear it every round.” This caused Claude to perform “forgetfully and repetitively” in long sessions, and each cache miss quickly consumed users’ usage. Anthropic pointed out that this bug “exists at the intersection of context management in Claude Code, the Anthropic API, and extended thinking,” involving multiple system boundary crossings, and is a kind of implicit error that is difficult to catch with unit tests. The fix was released on 4/10 as v2.1.101.
25-character concise instruction: only through ablation did they find a 3% intelligence drop
On 4/16, Anthropic added a system instruction: “Keep the text output between tool calls within 25 characters.” The intent was to reduce lengthy model explanations and make the experience cleaner. At the time, internal tests did not detect degradation, but after more rigorous ablation comparison experiments, the company found that this instruction caused an overall intelligence decline of about 3% for both Opus 4.6 and 4.7. On 4/20, it was rolled back in v2.1.116. This incident highlights that even tiny wording changes in a system prompt can have unintended structural effects on model behavior.
Scope of impact
Product layer: Claude Code (all three issues are affected), Claude Agent SDK (①②), Claude Cowork (all)
Model layer: Sonnet 4.6, Opus 4.6, Opus 4.7
API infrastructure: not affected
On the user experience side, it manifested as: reduced response quality and “intelligence,” increased latency, loss midstream in conversation context, and usage burning faster than expected.
Compensation and process improvements
On 4/23, Anthropic reset usage limits for all subscribers as a direct compensation. The process improvements the company also committed to include:
Implement a broader evaluation suite for system prompt changes
Improve Code Review tools to detect regressions earlier
Standardize internal testing into a public build to avoid discrepancies between “internal versions” and “external versions”
Add a soak period and phased rollout for changes that may affect model intelligence
Lessons for users
For users who rely on Claude Code for day-to-day development and research, this postmortem has three key takeaways. First, if you felt that Claude models “became dumber” between mid-March and 4/20, or that Claude Code showed abnormal forgetfulness in long sessions, that is not your imagination and not a prompt mishandling on your part. Second, users whose usage limits were quickly used up during this period can confirm after 4/23 whether Anthropic has automatically reset them. Third, even a “prompt fine-tuning within 25 characters” may produce system-wide effects on model behavior—this is a shared risk across LLM product engineering.
Compared with competitors who mostly respond in silence or with “this is user misuse,” Anthropic’s proactive disclosure and technical transparency set a reference example for AI product incident reviews.
This article: Anthropic self-reveals the three stacked Claude Code bugs—reasoning downgrade, cache forgetting, and 25-character instruction backfire—first appeared on 鏈新聞 ABMedia.
Related Articles
Anthropic’s Claude Mythos undergoes 20 hours of psychiatric assessment: defensive reactions are only 2%, the lowest in recorded history
AI Agents can already independently recreate complex academic papers: Mollick says most errors come from human original text rather than AI
OpenAI Merges Codex Into Main Model Starting with GPT-5.4, Discontinues Separate Coding Line
Salesforce to Hire 1,000 Graduates and Interns for AI Products, Raises FY2026 Revenue Guidance
Alibaba Cloud Launches Qwen-Image-2.0-Pro with Unified Text-to-Image and Editing, Supporting Multilingual Text Rendering
DeepSeek V4-Pro API Gets 75% Discount Until May 5, Output Price Drops to $0.87 Per Million Tokens