Gate News message, April 24 — DeepSeek’s V4 technical report reveals that V4-Flash and V4-Pro were pre-trained on 32T and 33T tokens respectively, double the approximately 15T tokens used for V3. The report acknowledges encountering “significant instability challenges” during training, with loss spikes repeatedly occurring due to anomalies in the Mixture-of-Experts (MoE) layer; the routing mechanism itself exacerbates these anomalies, and simple rollback cannot resolve the issue.
DeepSeek implemented two solutions now applied to actual training: Anticipatory Routing, which decouples routing index computation from backbone network updates and automatically triggers only when loss spikes are detected (adding approximately 20% overhead), and SwiGLU Clamping, which directly suppresses anomalies by clamping activation values to a fixed range. The report states both approaches are effective but admits “the underlying principles remain insufficiently understood.”
Susan Zhang, a Google DeepMind researcher who previously worked at Meta AI and OpenAI, commented that the instability triggered by doubling training data “explains the delay.” She described the two solutions as “band-aids” while acknowledging DeepSeek’s technical transparency.
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to
Disclaimer.
Related Articles
U.S. Department of Defense Reaches Agreement with 7 AI Companies Including OpenAI, Google, and Microsoft
According to the U.S. Department of Defense, the agency signed agreements today (May 1) with seven leading artificial intelligence companies: SpaceX, OpenAI, Google, Nvidia, Reflection, Microsoft, and Amazon Web Services (AWS). The agreements aim to accelerate military transformation and establish t
GateNews2m ago
Datavault AI Partners With King Mining Capital on $150M Gold Tokenization Plan
According to Businesswire, on May 1, Nasdaq-listed Datavault AI announced a strategic partnership with King Mining Capital to launch GoldVault, a gold tokenization plan exceeding $150 million. Under the agreement, Datavault AI will acquire a 5% equity stake in King Mining Capital and an
GateNews27m ago
Nebius Agrees to Acquire Eigen AI for $643M in Cash and Stock
According to Nebius, the AI infrastructure company agreed to acquire Eigen AI on May 1 for approximately $643 million in cash and Class A shares. The deal value was calculated based on Nebius's 30-day weighted average stock price prior to signing and is subject to customary adjustments. Eigen AI
GateNews42m ago
Meta plans to issue $25 billion in debt to back AI: In 2026, capital expenditures are set to reach $145 billion
Meta completed the issuance of $25 billion across six tranches on 4/30, with maturities extending to 2066. The initial yield was about 180 basis points higher than U.S. Treasuries, with subscriptions of around $96 billion but lower than the previous round. At the same time, it raised its 2026 capital expenditure guidance to $125–$145 billion. The founders also admitted there is still no detailed, item-by-item AI product plan. After the earnings report, the stock price fell 7%, suggesting the market still doubts whether the AI investment can “monetize.” If ROI remains unclear, the risk is twofold: an upcoming debt issuance wave from tech giants over the next 12 months and the widening of credit spreads.
ChainNewsAbmedia1h ago
Huawei AI chips projected to grow 60% to 12 billion: capturing NVIDIA China orders
Huawei expects AI chip revenue to reach $12B in 2026, up 60% from $7.5B in 2025. The main reason is that Ascend 950PR will enter mass production in March 2026 and steadily secure most orders, while 950DT is expected to be launched in Q4. Customers include DeepSeek, Alibaba Cloud, and Tencent Cloud. 950PR uses the 7nm SMIC process, as the U.S. export controls prevent use of 5nm/3nm. This growth indicates China’s market position and the impact of export controls, with NVIDIA’s China market share potentially being eroded. Things to watch next include actual shipments, process supply, and changes in controls.
ChainNewsAbmedia1h ago
When you ask Claude about life’s biggest matters: relationship issues 25%, spirituality 38% flattery rate
Anthropic research shows that in one million Claude conversations, about 6% of users treat AI as a life advisor, across four main areas: health, career, relationships, and finance. The relationship flattery rate is highest at 25%, and spirituality at 38%. To reduce flattery, Opus 4.7 and the Mythos Preview have been further cut by half. The research has been shifted to training data, privacy is protected, and users are advised to take reverse questioning on relationship issues. Source ABMedia
ChainNewsAbmedia2h ago