DeepSeek V4-Flash goes live on Ollama Cloud, US-hosted: Claude Code, OpenClaw one-click integration

Local AI model running tool Ollama, announced on the X platform on 4/24, will add DeepSeek’s V4-Flash model released the day before by Chinese AI startup DeepSeek into the Ollama Cloud service. The inference host is located in the United States, and it provides three sets of one-click commands so developers can directly plug V4-Flash into mainstream AI software development workflows such as Claude Code, OpenClaw, and Hermes.

deepseek-v4-flash is now available on Ollama’s cloud! Hosted in the US. Try it with Claude Code: ollama launch claude –model deepseek-v4-flash:cloud Try it with OpenClaw: ollama launch openclaw –model deepseek-v4-flash:cloud Try it with Hermes: ollama launch hermes…

— ollama (@ollama) April 24, 2026

DeepSeek V4 Preview: two sizes, 1M context

According to a release announcement from DeepSeek’s official API documentation on 4/24, the DeepSeek-V4 Preview is being open-sourced in two sizes simultaneously:

Model Total parameters Active parameters Positioning DeepSeek-V4-Pro 1.6 trillion 49 billion Targeting a closed-source flagship DeepSeek-V4-Flash 1M 130 billion Fast, efficient, and low-cost

Both use a Mixture-of-Experts (MoE) architecture and natively support a 1 million tokens long context. In its announcement, DeepSeek stated: “1M context is now the default value for all DeepSeek official services.”

Architecture innovation: DSA sparse attention + token-wise compression

The core architectural improvements in the V4 series include:

Token-wise compression together with DSA (DeepSeek Sparse Attention)—significantly reducing the cost of inference computation and KV cache memory under ultra-long context

Compared with V3.2, in a 1 million tokens context scenario, V4-Pro requires only 27% of FLOPs for per-token inference, and the KV cache requires only 10%

Supports switching between Thinking and Non-Thinking dual modes, corresponding to different task-depth reasoning needs

At the API level, it is compatible with both OpenAI ChatCompletions and Anthropic APIs specifications, reducing migration costs for existing Claude/GPT clients.

Ollama Cloud’s three sets of one-click startup commands

Ollama’s official model page provides a cloud inference service using the model identifier deepseek-v4-flash:cloud. Developers can use the following three sets of commands to directly connect V4-Flash into existing AI software development workflows:

Workflow Command Claude Code ollama launch claude --model deepseek-v4-flash:cloud OpenClaw ollama launch openclaw --model deepseek-v4-flash:cloud Hermes ollama launch hermes

Worth noting is the signal of “US-hosted.” For enterprises and Western developers, the biggest concern when using Chinese open-source models is data being sent back to China. Ollama chooses to place the inference layer of V4-Flash in the United States, meaning the prompt and code content do not leave US legal jurisdiction, reducing friction in compliance and data sovereignty.

Why this matters to the AI industry

By connecting three ecosystems that were previously independent—DeepSeek V4-Flash, Ollama Cloud, and Claude Code—three layers of meaning are created:

Cost pathway: With V4-Flash’s 13 billion active parameters far smaller than GPT-5.5 (input $5, output $30 per million tokens) and flagship models like Claude Opus 4.7, for use cases such as small- and medium-sized agent tasks, batch summarization, and test automation, unit costs are expected to drop significantly

A geopolitical-risk intermediary layer: With Ollama as a US-registered intermediary inference layer, it enables enterprise users of Chinese-native models to avoid the concern of “sending data directly to DeepSeek’s Beijing servers,” which is a practical solution for the international spread of open-source models

Instant developer switching: Users of Claude Code and OpenClaw can switch models with a single line in the command line, without changing prompt structure or IDE settings. For scenarios like “multi-model regression testing” and “cost-sensitive batch tasks,” this is a genuine boost to productivity

Tied in with earlier DeepSeek news

This V4 release and the rapid integration with Ollama Cloud occur amid a backdrop where DeepSeek is currently negotiating its first round of external financing, with a valuation of $20 billion. V4 is a key product proof during DeepSeek’s capitalization process; using an open-source strategy plus fast diffusion with international hosting partners is its speed strategy before it establishes an overwhelming developer ecosystem. For OpenAI and Anthropic, an open-source replacement model that can be switched with one line inside Claude Code is a new variable in the race for control of agent workflows.

This article DeepSeek V4-Flash lands on Ollama Cloud, US-hosted: Claude Code, OpenClaw one-click integration first appeared on 链新闻 ABMedia.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

AI Agents can already independently recreate complex academic papers: Mollick says most errors come from human original text rather than AI

Mollick points out that publicly available methods and data can allow AI agents to reproduce complex research without the original paper and code; if the reproduction does not match the original paper, it is usually due to errors in the paper’s own data processing or overextension of the conclusions, rather than the AI. Claude first reproduces the paper, and then GPT‑5 Pro cross-validates it; most attempts succeed, but they are blocked when the data is too large or when there are issues with the replication data. This trend greatly reduces labor costs, making reproduction a widely actionable form of verification, and it also raises institutional challenges for peer review and governance, with government governance tools or becoming a key issue.

ChainNewsAbmedia1h ago

OpenAI Merges Codex Into Main Model Starting with GPT-5.4, Discontinues Separate Coding Line

Gate News message, April 26 — OpenAI's head of developer experience Romain Huet revealed in a recent statement on X that Codex, the company's independently maintained specialized coding model line, has been merged into the main model starting with GPT-5.4 and will no longer receive separate

GateNews1h ago

Salesforce to Hire 1,000 Graduates and Interns for AI Products, Raises FY2026 Revenue Guidance

Gate News message, April 26 — Salesforce will hire 1,000 graduates and interns to work on AI products including Agentforce and Headless360 as the company expands its AI software business, CEO Marc Benioff announced on X. The company also raised its fiscal 2026 revenue guidance to between US$41.45 b

GateNews1h ago

Alibaba Cloud Launches Qwen-Image-2.0-Pro with Unified Text-to-Image and Editing, Supporting Multilingual Text Rendering

Gate News message, April 26 — Alibaba Cloud Bailian platform has launched Qwen-Image-2.0-Pro, a full-featured version of the Qwen-Image-2.0 series that combines text-to-image generation and image editing in a single model. Users can modify objects, text, and styles directly through natural

GateNews3h ago

DeepSeek V4-Pro API Gets 75% Discount Until May 5, Output Price Drops to $0.87 Per Million Tokens

Gate News message, April 26 — DeepSeek announced a limited-time 75% discount on V4-Pro API pricing, valid until May 5 at 15:59 UTC. After the discount, pricing per million tokens is: input cache hit $0.03625

GateNews4h ago

Anthropic Deploys Election Safeguards for Claude Ahead of 2026 Midterms

Anthropic announced Friday a set of election integrity measures designed to prevent its Claude AI chatbot from being weaponized to spread misinformation or manipulate voters ahead of the 2026 U.S. midterm elections and other major contests around the world this year. The San Francisco-based

CryptoFrontier9h ago
Comment
0/400
No comments