OpenAI officially released GPT-5.5 on 4/23, positioning it as a flagship model for agentic work and enterprise knowledge processing, while also rolling out on ChatGPT and Codex. The official promotional message is set as “our smartest model and the most intuitive to use,” and the AA Intelligence Index tops out at 60 points, ahead of Claude Opus 4.7 and Gemini 3.1 Pro Preview by 3 points each.
Key data at a glance
Indicator GPT-5.5 versus (GPT-5.4 or same-tier competitors) AA Intelligence Index 60 Claude Opus 4.7: 57; Gemini 3.1 Pro Preview: 57 Terminal-Bench 2.0 (command-line workflow) 82.7% GPT-5.4: 75.1% Expert-SWE (OpenAI internal programming evaluation) 73.1% GPT-5.4: 68.5% Context window 12 million tokens dramatically increased, enabling it to handle an entire enterprise codebase or several hours of video Price (per million tokens) Input $5, output $30 GPT-5.4 double the unit price; but output token usage drops by about 40%, net cost rises by about 20%
Positioning: Built for the “Agent Era”
OpenAI describes GPT-5.5 as a foundational model for agentic computing—able to understand complex goals, use tools, self-check its work results, and complete multi-step tasks without requiring humans to intervene at every step. According to a TechCrunch interview, President Greg Brockman characterizes this version as “a big step toward future computing, but only a step,” and emphasizes that it is “a faster, sharper reasoner than 5.4, using fewer tokens.”
Chief Scientist Jakub Pachocki noted, “We’re seeing very significant improvements in the short term”; Research Lead Mark Chen, meanwhile, emphasized that this release delivers “meaningful breakthroughs” in scientific and technical research workflows.
Supply scope and version tiering
GPT-5.5: Plus, Pro, Business, and Enterprise users can use it in ChatGPT and Codex
GPT-5.5 Pro: A more advanced reasoning version available in ChatGPT for Pro, Business, and Enterprise users
Codex integration: Also available in OpenAI’s program agent tools, strengthening multi-file editing, command-line support, and test loops
Cybersecurity and defense rhetoric rises in parallel
When asked in a TechCrunch interview, Mia Glaese, a member of the technical team, said that GPT-5.5’s cybersecurity capabilities will “have a major impact on how OpenAI deploys models into digital defense.” This rhetoric directly mirrors recent controversy from Anthropic around Claude Mythos, a weapon-grade cybersecurity model—Altman previously criticized Anthropic’s “fear marketing” strategy on the 《Core Memory》 show. With GPT-5.5, OpenAI places even more emphasis on the narrative of “attack and defense in one, deployable,” aiming to draw a clearer contrast with Anthropic’s stance of limiting access.
Pricing strategy changes
GPT-5.5’s price per million tokens doubles to input $5 and output $30—this is the first generation in the GPT-5 series where unit prices rise significantly. OpenAI’s explanation is that the model can reduce output token usage by about 40% in terms of reasoning efficiency, so the typical bill for actual tasks is about 20% higher than GPT-5.4, not simply 2x. For enterprises, the decision therefore shifts from “is the unit price worth it?” to “under the same prompt, can GPT-5.5 complete more complex tasks with a smaller total token count?”
Signals for the industry
GPT-5.5 widens the gap in OpenAI’s performance on Terminal-Bench and internal SWE evaluations. These two benchmarks test command-line agent execution and real software engineering tasks respectively—making the scores a more direct battleground versus Codex and Claude Code. Combined with the simultaneous opening of a 12 million token context window, OpenAI adds pressure to both “full-scale enterprise knowledge base processing” and “long-task agents” tracks at the same time. For Anthropic, Claude Opus 4.7 trails by 3 points—57 on the AA index—while for Claude Code users, there’s also another reason to watch the progress of the next generation (Opus 4.8 or a new Claude).
This article on OpenAI pushing GPT-5.5:12M context, tops the AA index, rewrites the agent benchmark with Terminal-Bench 82.7% first appeared on Chain News ABMedia.
Related Articles
AI Agents can already independently recreate complex academic papers: Mollick says most errors come from human original text rather than AI
UAE Announces Shift Toward AI Government Model in the Next Two Years
AI Trading Platform Fere AI Raises $1.3M in Funding Led by Ethereal Ventures
Nvidia Deploys OpenAI Codex AI Agent Across Entire Workforce on Blackwell Infrastructure
AI Coding Startup Cognition in Talks for $25B Valuation Funding Round