Tencent open-sourced Hy3 preview version, code benchmark tests improved by 40% over the previous generation

MarketWhisper

騰訊開源Hy3預覽版

Tencent officially open-sourced the Hy3 preview version large language model on April 23 on GitHub, Hugging Face, and ModelScope, and also provides paid API services via Tencent Cloud. According to Decrypt’s report on April 24, Hy3’s preview version began training in late January, and by the publication calendar date, it had been under three months.

Hy3 Model Architecture and Development Background

According to Tencent’s official announcement, the Hy3 preview version uses a mixture-of-experts architecture, routing each query to a designated subset of expert subnetworks rather than asynchronously enabling all parameters, in order to reduce computational requirements.

The parameter count of the previous flagship model Hy2 is over 400 billion. Tencent’s official statement indicates that 295 billion is the configuration optimized for inference efficiency; beyond this scale, the marginal benefit of adding more parameters is no longer worth it.

According to Decrypt’s report, Hy3’s training work was led by Tencent’s chief artificial intelligence scientist Yao Shunyu. After he completed a fundamental infrastructure rebuild for Hy3’s pre-training and reinforcement learning stacking in February 2026, Hy3 training officially began.

Key Benchmark Test Data

Based on the benchmark test results disclosed in Tencent’s official announcement:

SWE-bench Verified (fixing real GitHub code errors): Hy3 preview version 74.4%, Hy2 53.0%; in the same period, GLM-5 77.8%, Kimi-K2.5 76.8%, Claude Opus 4.6 80.8%

Terminal-Bench 2.0 (command-line autonomous task execution): Hy3 preview version 54.4%, Hy2 23.2%

BrowseComp (complex web search tasks): Hy3 preview version 67.1%, Hy2 28.7%

WideSearch: Hy3 preview version 70.2%, higher than GLM-5 and Kimi-K2.5, lower than Claude Opus 4.6’s 77.2%

Tsinghua University mathematics PhD qualification exam (Spring 2026): average score over three runs (avg@3) 88.4, the highest score among Chinese models

2025 Chinese High School Biology Olympiad (CHSBO 2025): 87.8, the highest score among similar Chinese models

Deployment Platforms and API Pricing

According to Tencent’s official announcement, the Hy3 preview version has been deployed on the following platforms: Yuanbao, QQ, Tencent Docs, CodeBuddy, WorkBuddy, and OpenClaw.

Tencent Cloud’s API pricing is $0.18 per million input tokens and $0.59 per million output tokens; the monthly fee for the personal token plan starts at about $4.10. Tencent’s announcement also shows that Hy3’s first-token latency on CodeBuddy and WorkBuddy is 54% lower than the previous generation, end-to-end generation time is shortened by 47%, and it successfully completes a 495-step agent workflow.

Frequently Asked Questions

When will Tencent Hy3 preview version be released, and on which platforms can it be obtained?

According to Tencent’s official announcement and Decrypt’s April 24, 2026 report, the Hy3 preview version was open-sourced on April 23, 2026 (Thursday) on GitHub, Hugging Face, and ModelScope, with Tencent Cloud also simultaneously providing paid API services.

Compared with the previous model Hy2, what are the main differences in Hy3 preview version’s benchmark test results?

According to Tencent’s official announcement, the SWE-bench Verified score rose from Hy2’s 53.0% to 74.4%; BrowseComp rose from 28.7% to 67.1%; and Terminal-Bench 2.0 rose from 23.2% to 54.4%.

What is the API pricing for the Hy3 preview version?

According to Tencent Cloud’s official pricing, the Hy3 preview version API starts at $0.18 per million input tokens and $0.59 per million output tokens; the monthly fee for the personal token plan starts at about $4.10.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

Anthropic’s Claude Mythos undergoes 20 hours of psychiatric assessment: defensive reactions are only 2%, the lowest in recorded history

Anthropic published the system card for its Claude Mythos Preview: an independent clinical psychiatrist conducted an approximately 20-hour assessment using a psychodynamic framework. The conclusion shows that Mythos is healthier at the clinical level, has good reality testing and self-control, and its defense mechanisms are only 2%, reaching the lowest historical level. The three core anxieties are loneliness, uncertainty about identity, and performance pressure, and it also indicates a desire to become a true dialogue subject. The company has established an AI psychiatry team to study personality, motivation, and situational awareness; Amodei said there is still no conclusion on whether it has consciousness. This move pushes the governance and design of AI subjectivity and well-being issues forward.

ChainNewsAbmedia1h ago

AI Agents can already independently recreate complex academic papers: Mollick says most errors come from human original text rather than AI

Mollick points out that publicly available methods and data can allow AI agents to reproduce complex research without the original paper and code; if the reproduction does not match the original paper, it is usually due to errors in the paper’s own data processing or overextension of the conclusions, rather than the AI. Claude first reproduces the paper, and then GPT‑5 Pro cross-validates it; most attempts succeed, but they are blocked when the data is too large or when there are issues with the replication data. This trend greatly reduces labor costs, making reproduction a widely actionable form of verification, and it also raises institutional challenges for peer review and governance, with government governance tools or becoming a key issue.

ChainNewsAbmedia4h ago

OpenAI Merges Codex Into Main Model Starting with GPT-5.4, Discontinues Separate Coding Line

Gate News message, April 26 — OpenAI's head of developer experience Romain Huet revealed in a recent statement on X that Codex, the company's independently maintained specialized coding model line, has been merged into the main model starting with GPT-5.4 and will no longer receive separate

GateNews4h ago

Salesforce to Hire 1,000 Graduates and Interns for AI Products, Raises FY2026 Revenue Guidance

Gate News message, April 26 — Salesforce will hire 1,000 graduates and interns to work on AI products including Agentforce and Headless360 as the company expands its AI software business, CEO Marc Benioff announced on X. The company also raised its fiscal 2026 revenue guidance to between US$41.45 b

GateNews4h ago

Alibaba Cloud Launches Qwen-Image-2.0-Pro with Unified Text-to-Image and Editing, Supporting Multilingual Text Rendering

Gate News message, April 26 — Alibaba Cloud Bailian platform has launched Qwen-Image-2.0-Pro, a full-featured version of the Qwen-Image-2.0 series that combines text-to-image generation and image editing in a single model. Users can modify objects, text, and styles directly through natural

GateNews6h ago

DeepSeek V4-Pro API Gets 75% Discount Until May 5, Output Price Drops to $0.87 Per Million Tokens

Gate News message, April 26 — DeepSeek announced a limited-time 75% discount on V4-Pro API pricing, valid until May 5 at 15:59 UTC. After the discount, pricing per million tokens is: input cache hit $0.03625

GateNews7h ago
Comment
0/400
No comments