GPT-5 and the Frontier-Model Plateau Debate

Archive note: this is a backdated post, written years later while rebuilding this site. It’s dated to the moment it covers, but the hindsight is real.

GPT-5 shipped August 7, carrying three years of accumulated mythology — the version number that meant the leap — and the discourse since has been the most instructive in this column’s history: not because the model is bad, but because the reaction revealed what everyone actually believes about the curve.

What shipped, and what happened

The release is less one model than a system: a router dispatching queries between fast paths and deeper reasoning paths, replacing the model-picker zoo. Capable across the board, state-of-the-art in places, cheaper at scale — and, in the launch’s defining stumble, OpenAI initially removed the older models people had built workflows (and, more strikingly, attachments) around, then restored 4o within days under genuine user revolt. There’s a whole sociology paper in users grieving a deprecated model’s personality; the parasocial prediction from the voice era cashes early.

But the headline discourse is the plateau debate: set against the GPT-3→4 leap, 5 reads as consolidation — and the “scaling is over” takes arrived within hours.

The careful version of the argument

Having tracked every release since the library learned to talk, my reading is that both camps are arguing past the actual shape:

The wow curve plateaued; the capability curve forked. What stalled is the universal, demo-able jump everyone could feel. What didn’t stall: verifiable-domain performance — math, code, agentic reliability — which keeps climbing exactly where the September prediction said it would, because checkable answers feed RL. The frontier didn’t stop; it specialized, away from spectacle and toward the domains where progress can be measured. Mine, as noted then, uncomfortably included.
The router is the tell. When a flagship’s centerpiece is cost-quality dispatch rather than raw new capability, the message is economic: frontier inference is a margin business now, and the four-month floor chasing it commoditizes everything but the newest tier. Product maturity, arriving on schedule — Laravel 12 made the same announcement in framework dialect: the revolution phase ends, the stability contract begins.
“Plateau” is doing motivated work for everyone. The doomer-camp reads it as reprieve, the skeptic camp as vindication, the labs as a pricing problem. Meanwhile the working-dev ledger — the only one this blog trusts — records that the agents built on these “plateaued” models did more of my actual job this month than any month prior. Capability-in-harness keeps compounding even when capability-in-benchmark consolidates. The leverage was never in the wow.

What it means from here

For builders, the practical translation is calming: model choice is becoming boring, switching costs are protocol-mediated, and the differentiation moved up the stack — to data, integration, workflow, and judgment, which is to say, back to software engineering. The era when a product could be “GPT-wrapper plus vibes” ended somewhere between DevDay and this launch; the era where AI is an assumed ingredient, like the database, is fully here.

Prediction, banked for December’s audit: the plateau debate itself becomes the plateau’s victim — superseded within a year by progress in the harness layer (agents, tools, verification) that makes the base-model question feel as quaint as arguing CPU clock speeds after the multicore turn. The number on the box stopped mattering for laptops in 2005. August 2025 is when it stopped mattering for models.