ChatGPT-5.5 Is Here. It Scores 84.9% Across 44 Professions and 98% on Customer Service.

OpenAI dropped GPT-5.5 today. The model launched just six weeks after its predecessor. And the benchmarks suggest it is no longer a chatbot. It is a worker.

[IMAGE: Screenshot of the GPT-5.5 announcement page on OpenAI's website | Source: OpenAI blog]

What OpenAI Just Shipped

The new model is rolling out to Plus, Pro, Business, and Enterprise subscribers in ChatGPT and in Codex, OpenAI's coding assistant. A separate version called GPT-5.5 Pro is going to Pro, Business, and Enterprise users for higher-accuracy tasks. API access is "coming very soon," according to the company.

The pace is the first thing worth noticing. GPT-5.4 launched on March 5. GPT-5.5 arrived April 23. That is a six-week turnaround for a major model upgrade, and it signals how fast the top AI labs are shipping in 2026.

The second thing is the price. API access for GPT-5.5 costs $5 per million input tokens and $30 per million output tokens, which is twice the price of GPT-5.4. GPT-5.5 Pro is steeper still at $30 input and $180 output. OpenAI's pitch: the model is more intelligent and more token-efficient, so the total bill should come out similar for many tasks.

Introducing GPT-5.5

A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done.

Now available in ChatGPT and Codex. pic.twitter.com/rPLTk99ZH5
— OpenAI (@OpenAI) April 23, 2026

OpenAI President Greg Brockman, speaking on a press call, called GPT-5.5 "a new class of intelligence" and "a big step towards more agentic and intuitive computing." Translation for the rest of us: the model can do more of a task by itself, without constant steering.

The Benchmark Numbers Built for Your Job

This is where it gets interesting for working professionals. OpenAI released a suite of benchmarks measuring GPT-5.5 on actual job tasks, not trivia or math puzzles.

On GDPval, a benchmark that tests knowledge work across 44 different occupations, GPT-5.5 scored 84.9%. These are the kinds of tasks a consultant, analyst, or project manager would recognize: drafting reports, producing deliverables, following specifications.

On Tau2-bench Telecom, which simulates complex customer-service workflows, the model reached 98.0% accuracy without any prompt tuning. That number is not a typo. In controlled tests of call-center style work, GPT-5.5 now solves nearly every task.

On OSWorld-Verified, which measures whether an AI can operate a real computer on its own, GPT-5.5 reached 78.7%. This matters because it means the model can navigate a desktop, open files, fill forms, and click through multi-step software flows on behalf of a user.

The specialized numbers hit hard too. 88.5% on internal investment-banking modeling tasks. 60.0% on FinanceAgent. 54.1% on OfficeQA Pro, which evaluates everyday office work. On coding, GPT-5.5 posted 82.7% on Terminal-Bench 2.0 and 58.6% on SWE-Bench Pro for resolving real GitHub issues.

A pattern emerges: GPT-5.5 was not tuned to impress researchers. It was tuned to impress employers.

If you run a customer service team, the 98% number is not theoretical. It means the workflows your team handles, including returns, billing questions, and account changes, are now inside the performance envelope of a single model call. Companies piloting AI agents for support can now point to a benchmark that justifies deeper deployment.

If you work in finance, the investment-banking score and the FinanceAgent number mean the same thing a different way: modeling tasks that junior analysts spend hours on are increasingly within reach for an API call. That does not replace the analyst. It changes what the analyst spends their day doing.

And if you are a knowledge worker in any of the 44 professions GDPval covers, you are now measured against a model that scores 84.9% on your type of work. The right response is not panic. The right response is to figure out which parts of your job the model is actually good at, and which parts it still flubs.

OpenAI also said GPT-5.5 is better at multi-part tasks that require planning, using tools, and checking its own work. That is the crucial shift. Earlier ChatGPT versions needed you to break a task into small steps. This model is designed to take a vague goal and carry it through to completion. That is what OpenAI means by "agentic."

The Bigger Picture: An Enterprise Arms Race

Brockman used the press call to tease something larger. He described GPT-5.5 as a step toward OpenAI's long-rumored "superapp", a single multi-purpose tool that combines ChatGPT, Codex, and other products into one interface. Sam Altman has floated the same idea in recent months.

The release also lands in the middle of a fierce enterprise fight. OpenAI now reports 4 million active Codex users and 9 million paying business users on ChatGPT. Weekly active users crossed 900 million. Subscribers passed 50 million. Those are the numbers OpenAI wanted public today, because a narrative had been building that the company is losing enterprise ground to Anthropic.

Anthropic released a cybersecurity-focused model called Mythos earlier this month, which triggered its own news cycle. OpenAI pointedly classified GPT-5.5's cybersecurity capabilities as "High" under its Preparedness Framework, while noting it did not reach the "Critical" threshold. The message to enterprise buyers: we are taking safety seriously, and we are competitive on the same ground.

The rhythm of 2026 is becoming clear. Top labs are shipping every six to eight weeks. Benchmarks are getting measured on real work, not toy problems. Prices are creeping up at the top tier and creeping down at the bottom. Enterprises are being asked to pick a side.

💡

Bottom line: GPT-5.5's benchmarks are no longer about whether AI can do pieces of your job. They're about which tool your company will deploy, and how fast you will need to be fluent in it.

For working professionals, the practical question is no longer whether AI can do parts of your job. GPT-5.5's benchmarks make that obvious. The real question is which tool your company will standardize on, and how quickly you become fluent in it before someone else does.

OpenAI plans to bring GPT-5.5 to its public API "very soon." Expect enterprise integrations to follow within weeks. And at this pace, the next model release is likely to land by early June.

What OpenAI Just Shipped

The Benchmark Numbers Built for Your Job

The Bigger Picture: An Enterprise Arms Race

Share

Enjoyed this article?