DeepSeek officially released a V4 preview series on April 24. It is open-sourced under the MIT license, and the model weights have been simultaneously published on Hugging Face and ModelScope. According to the DeepSeek V4 technical report, V4-Pro-Max (the strongest inference mode) scored 3206 on the Codeforces benchmark, surpassing GPT-5.4.

Specifications for Two MoE Model Architectures

According to the DeepSeek V4 technical report, the V4 series includes two mixture-of-experts (MoE) models:

V4-Pro: Total parameters 1.6T, 49B activated per token, supports a 1M token context

V4-Flash: Total parameters 284B, 13B activated per token, also supports a 1M token context

According to the technical report, under a 1M context, V4-Pro’s per-token inference FLOPs are only 27% of V3.2, and the KV cache is reduced to 10% of V3.2. This is mainly due to an architectural upgrade of the mixture attention mechanism (compressed sparse attention CSA + heavily compressed attention HCA). The pretraining data scale exceeds 32T tokens; the training optimizer has been updated to Muon.

Post-Training Methodology: Online Policy Distillation Replaces Mixed Reinforcement Learning

According to the DeepSeek V4 technical report, the core update in V4 post-training is that online policy distillation (On-Policy Distillation, OPD) completely replaces the mixed reinforcement learning (mixed RL) stage of V3.2. The new process is divided into two steps: first, domain experts are trained separately for areas such as math, code, agents, and instruction following (SFT + GRPO reinforcement learning); then, the capabilities of a dozen-plus experts are distilled into a unified model using multi-teacher OPD, and logit alignment is used to avoid the common capability conflicts seen in traditional methods.

The report also introduces a generative reward model (Generative Reward Model, GRM). For tasks that are difficult to verify with rules, it is trained with a small amount of diverse human-annotated data, enabling the model to handle both generation and evaluation functions simultaneously.

Benchmark Results: Leading on Coding, Still a Gap in Knowledge Reasoning

According to the DeepSeek V4 technical report, the comparison results of V4-Pro-Max with Opus 4.6 Max, GPT-5.4 xHigh, and Gemini 3.1 Pro High (excluding the recently released GPT-5.5 and Opus 4.7):

Codeforces: 3206 (GPT-5.4: 3168 / Gemini 3.1 Pro: 3052) → Highest across the board

LiveCodeBench: 93.5 → Highest across the board

SWE Verified: 80.6, behind Opus 4.6’s 80.8 by 0.2 percentage points

GPQA Diamond: 90.1, behind Gemini 3.1 Pro’s 94.3

SimpleQA-Verified: 57.9, behind Gemini 3.1 Pro’s 75.6

HLE: 37.7, behind Gemini 3.1 Pro’s 44.4

The technical report also states that the above comparisons exclude the recently released GPT-5.5 and Opus 4.7, and the gap between V4 and the latest generation of closed-source models remains to be verified by third-party evaluations.

Frequently Asked Questions

What are the open-source license terms for the DeepSeek V4 preview version, and where can I obtain them?

According to DeepSeek’s official announcement on April 24, the V4 series is open-sourced under the MIT license. Model weights are available on Hugging Face and ModelScope, and it applies to both commercial and academic use.

What are the differences in parameter scale between DeepSeek V4-Pro and V4-Flash?

According to the DeepSeek V4 technical report, V4-Pro has total parameters of 1.6T, with 49B activated per token; V4-Flash has total parameters of 284B, with 13B activated per token. Both models support a 1M token context.

What are the benchmark comparison results for DeepSeek V4-Pro-Max versus GPT-5.4 and Gemini 3.1 Pro?

According to the DeepSeek V4 technical report, V4-Pro-Max surpasses GPT-5.4 and Gemini 3.1 Pro on the Codeforces (3206) and LiveCodeBench (93.5) benchmarks, but still lags behind Gemini 3.1 Pro on knowledge-intensive benchmarks (GPQA Diamond, SimpleQA-Verified, HLE). The comparison set excludes GPT-5.5 and Opus 4.7.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Anthropic’s Claude Mythos undergoes 20 hours of psychiatric assessment: defensive reactions are only 2%, the lowest in recorded history

AI Industry News

Anthropic published the system card for its Claude Mythos Preview: an independent clinical psychiatrist conducted an approximately 20-hour assessment using a psychodynamic framework. The conclusion shows that Mythos is healthier at the clinical level, has good reality testing and self-control, and its defense mechanisms are only 2%, reaching the lowest historical level. The three core anxieties are loneliness, uncertainty about identity, and performance pressure, and it also indicates a desire to become a true dialogue subject. The company has established an AI psychiatry team to study personality, motivation, and situational awareness; Amodei said there is still no conclusion on whether it has consciousness. This move pushes the governance and design of AI subjectivity and well-being issues forward.

ChainNewsAbmedia1h ago

AI Agents can already independently recreate complex academic papers: Mollick says most errors come from human original text rather than AI

AI Agent AI Industry News

Mollick points out that publicly available methods and data can allow AI agents to reproduce complex research without the original paper and code; if the reproduction does not match the original paper, it is usually due to errors in the paper’s own data processing or overextension of the conclusions, rather than the AI. Claude first reproduces the paper, and then GPT‑5 Pro cross-validates it; most attempts succeed, but they are blocked when the data is too large or when there are issues with the replication data. This trend greatly reduces labor costs, making reproduction a widely actionable form of verification, and it also raises institutional challenges for peer review and governance, with government governance tools or becoming a key issue.

ChainNewsAbmedia4h ago

OpenAI Merges Codex Into Main Model Starting with GPT-5.4, Discontinues Separate Coding Line

AI Industry News

Gate News message, April 26 — OpenAI's head of developer experience Romain Huet revealed in a recent statement on X that Codex, the company's independently maintained specialized coding model line, has been merged into the main model starting with GPT-5.4 and will no longer receive separate

GateNews4h ago

Salesforce to Hire 1,000 Graduates and Interns for AI Products, Raises FY2026 Revenue Guidance

AI Industry News

Gate News message, April 26 — Salesforce will hire 1,000 graduates and interns to work on AI products including Agentforce and Headless360 as the company expands its AI software business, CEO Marc Benioff announced on X. The company also raised its fiscal 2026 revenue guidance to between US$41.45 b

GateNews4h ago

Alibaba Cloud Launches Qwen-Image-2.0-Pro with Unified Text-to-Image and Editing, Supporting Multilingual Text Rendering

AI Industry News

Gate News message, April 26 — Alibaba Cloud Bailian platform has launched Qwen-Image-2.0-Pro, a full-featured version of the Qwen-Image-2.0 series that combines text-to-image generation and image editing in a single model. Users can modify objects, text, and styles directly through natural

GateNews6h ago

DeepSeek V4-Pro API Gets 75% Discount Until May 5, Output Price Drops to $0.87 Per Million Tokens

AI Industry News

Gate News message, April 26 — DeepSeek announced a limited-time 75% discount on V4-Pro API pricing, valid until May 5 at 15:59 UTC. After the discount, pricing per million tokens is: input cache hit $0.03625

GateNews7h ago

Comment

0/400

No comments