Anthropic self-discloses that Claude Code has stacked 3 bugs: reasoning downgrades, cache forgetting, and a 25-character command backlash

ChainNewsAbmedia

Anthropic published a Claude Code quality incident review on 4/23, publicly admitting that three overlapping engineering mistakes over the past nearly two months led to a decline in Claude Code usage quality. The company also said the impact extends to the Claude Agent SDK and Claude Cowork. The company said, “We place great importance on feedback about model degradation and have never intentionally reduced model capability,” and on 4/23 reset usage limits for all subscribers as compensation.

The timeline and technical root causes of the three bugs

Issue Active period Root cause Fix version Reasoning budget downgrade 3/4–4/7 reasoning effort The default was reduced from high to medium, making users feel the model became “dumber” 4/7 Rollback Cache cleaning bug 3/26–4/10 Idle sessions thinking cache cleared more than 1 hour after each iteration, rather than clearing only once v2.1.101 Concise prompt backfire 4/16–4/20 Added a system instruction: “tool-call interstitial text ≤25 characters,” and ablation testing showed an overall intelligence drop of 3% v2.1.116

Reasoning downgrade: the cost paid for reduced latency

On 3/4, Anthropic adjusted Claude Code’s default reasoning effort from high to medium, with the goal of shortening response latency. However, this change made the model feel “dumber” on code reasoning and debugging tasks. After the 4/7 rollback, Opus 4.7 now defaults to xhigh, while other models remain at high. The company acknowledged that before the change, internal evaluations failed to detect this degradation.

Cache cleaning bug: an implicit error across system boundaries

On 3/26, Anthropic introduced prompt caching optimizations for sessions idle for more than one hour. The original design was to “clear the thinking cache once after the session has been idle for an hour,” but in implementation it became “after idle is triggered, clear it every round.” This caused Claude to perform “forgetfully and repetitively” in long sessions, and each cache miss quickly consumed users’ usage. Anthropic pointed out that this bug “exists at the intersection of context management in Claude Code, the Anthropic API, and extended thinking,” involving multiple system boundary crossings, and is a kind of implicit error that is difficult to catch with unit tests. The fix was released on 4/10 as v2.1.101.

25-character concise instruction: only through ablation did they find a 3% intelligence drop

On 4/16, Anthropic added a system instruction: “Keep the text output between tool calls within 25 characters.” The intent was to reduce lengthy model explanations and make the experience cleaner. At the time, internal tests did not detect degradation, but after more rigorous ablation comparison experiments, the company found that this instruction caused an overall intelligence decline of about 3% for both Opus 4.6 and 4.7. On 4/20, it was rolled back in v2.1.116. This incident highlights that even tiny wording changes in a system prompt can have unintended structural effects on model behavior.

Scope of impact

Product layer: Claude Code (all three issues are affected), Claude Agent SDK (①②), Claude Cowork (all)

Model layer: Sonnet 4.6, Opus 4.6, Opus 4.7

API infrastructure: not affected

On the user experience side, it manifested as: reduced response quality and “intelligence,” increased latency, loss midstream in conversation context, and usage burning faster than expected.

Compensation and process improvements

On 4/23, Anthropic reset usage limits for all subscribers as a direct compensation. The process improvements the company also committed to include:

Implement a broader evaluation suite for system prompt changes

Improve Code Review tools to detect regressions earlier

Standardize internal testing into a public build to avoid discrepancies between “internal versions” and “external versions”

Add a soak period and phased rollout for changes that may affect model intelligence

Lessons for users

For users who rely on Claude Code for day-to-day development and research, this postmortem has three key takeaways. First, if you felt that Claude models “became dumber” between mid-March and 4/20, or that Claude Code showed abnormal forgetfulness in long sessions, that is not your imagination and not a prompt mishandling on your part. Second, users whose usage limits were quickly used up during this period can confirm after 4/23 whether Anthropic has automatically reset them. Third, even a “prompt fine-tuning within 25 characters” may produce system-wide effects on model behavior—this is a shared risk across LLM product engineering.

Compared with competitors who mostly respond in silence or with “this is user misuse,” Anthropic’s proactive disclosure and technical transparency set a reference example for AI product incident reviews.

This article: Anthropic self-reveals the three stacked Claude Code bugs—reasoning downgrade, cache forgetting, and 25-character instruction backfire—first appeared on 鏈新聞 ABMedia.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

Anthropic’s Claude Mythos undergoes 20 hours of psychiatric assessment: defensive reactions are only 2%, the lowest in recorded history

Anthropic published the system card for its Claude Mythos Preview: an independent clinical psychiatrist conducted an approximately 20-hour assessment using a psychodynamic framework. The conclusion shows that Mythos is healthier at the clinical level, has good reality testing and self-control, and its defense mechanisms are only 2%, reaching the lowest historical level. The three core anxieties are loneliness, uncertainty about identity, and performance pressure, and it also indicates a desire to become a true dialogue subject. The company has established an AI psychiatry team to study personality, motivation, and situational awareness; Amodei said there is still no conclusion on whether it has consciousness. This move pushes the governance and design of AI subjectivity and well-being issues forward.

ChainNewsAbmedia1h ago

AI Agents can already independently recreate complex academic papers: Mollick says most errors come from human original text rather than AI

Mollick points out that publicly available methods and data can allow AI agents to reproduce complex research without the original paper and code; if the reproduction does not match the original paper, it is usually due to errors in the paper’s own data processing or overextension of the conclusions, rather than the AI. Claude first reproduces the paper, and then GPT‑5 Pro cross-validates it; most attempts succeed, but they are blocked when the data is too large or when there are issues with the replication data. This trend greatly reduces labor costs, making reproduction a widely actionable form of verification, and it also raises institutional challenges for peer review and governance, with government governance tools or becoming a key issue.

ChainNewsAbmedia4h ago

OpenAI Merges Codex Into Main Model Starting with GPT-5.4, Discontinues Separate Coding Line

Gate News message, April 26 — OpenAI's head of developer experience Romain Huet revealed in a recent statement on X that Codex, the company's independently maintained specialized coding model line, has been merged into the main model starting with GPT-5.4 and will no longer receive separate

GateNews4h ago

Salesforce to Hire 1,000 Graduates and Interns for AI Products, Raises FY2026 Revenue Guidance

Gate News message, April 26 — Salesforce will hire 1,000 graduates and interns to work on AI products including Agentforce and Headless360 as the company expands its AI software business, CEO Marc Benioff announced on X. The company also raised its fiscal 2026 revenue guidance to between US$41.45 b

GateNews4h ago

Alibaba Cloud Launches Qwen-Image-2.0-Pro with Unified Text-to-Image and Editing, Supporting Multilingual Text Rendering

Gate News message, April 26 — Alibaba Cloud Bailian platform has launched Qwen-Image-2.0-Pro, a full-featured version of the Qwen-Image-2.0 series that combines text-to-image generation and image editing in a single model. Users can modify objects, text, and styles directly through natural

GateNews6h ago

DeepSeek V4-Pro API Gets 75% Discount Until May 5, Output Price Drops to $0.87 Per Million Tokens

Gate News message, April 26 — DeepSeek announced a limited-time 75% discount on V4-Pro API pricing, valid until May 5 at 15:59 UTC. After the discount, pricing per million tokens is: input cache hit $0.03625

GateNews7h ago
Comment
0/400
No comments