Building with AI Agents in the Terminal

Archive note: this is a backdated post, written years later while rebuilding this site. It’s dated to the moment it covers, but the hindsight is real.

Seven years ago I wrote a short post about living in the terminal. It was about tmux and keeping abstractions in my head. I could not have guessed the terminal’s next act: this spring, the defining AI tools are CLI coding agents — Claude Code and its fast-multiplying peers — that sit in your shell, read your codebase, run your commands, and execute multi-step work while you watch the diffs scroll by.

December’s prediction — agents, scoped and tool-equipped, landing in the terminal — is now my working day. A month of field notes:

Why this works when AutoGPT didn’t

Round one, 2023 failed structurally: chat models in while-loops, no memory, no judgment about doneness, walking into walls at API prices. Round two changed every component at once:

Reasoning models underneath. The o1/R1 generation plans multi-step work the 2023 models could only narrate. The difference shows exactly where the toys broke: the agent decomposes a task, hits an error, diagnoses it, and adjusts — instead of looping into confident mush.
A real environment instead of a fantasy of one. The 2023 agents hallucinated their world. A CLI agent’s world is your actual repo: it greps, reads files, runs your test suite, and gets ground truth feedback every step. Verifiable surroundings discipline a model the way checkable domains discipline training.
Scope and a leash. The winning form is precisely the unglamorous one predicted: narrow task, explicit tool permissions, human approval on the risky steps, diffs you review before they land. Less employee-cosplay, more very fast junior with the commit bit withheld.

How the work actually feels

The honest texture: I describe an outcome (“add soft-deletes to these models, update the queries, fix the tests”), the agent works the checklist, and my role compresses into specification up front and hostile review at the end. Tasks that were an afternoon are a coffee. The skills that matter are the ones this column has watched appreciate for three years: writing precise requirements, reading diffs ruthlessly, knowing the codebase well enough to smell a wrong turn — and now, a new one: calibrating delegation. Too small a task wastes the leverage; too large and you’re reviewing a stranger’s rewrite. The sweet spot is a well-named branch’s worth of work.

The terminal-native part isn’t nostalgia; it’s correctness. The shell is where the project’s truth lives — git, tests, linters, deploys — and an agent that operates the same tools I do is auditable in a way IDE magic never was. My 2018 reasons for loving the CLI (everything scriptable, everything inspectable) turned out to be the requirements for trusting an AI with it.

The caveats, on the record

It burns tokens like a hobby; cost-awareness is a skill again. It’s overconfident in unfamiliar codebases and needs guardrails (CLAUDE.md-style project notes are load-bearing). And the vibe-coding boundary question intensifies: an agent producing 500-line diffs makes “review every line” expensive precisely when it matters most. The discipline holds in my shop. I watch the industry’s discipline with professional concern.

Prediction, filed: by year’s end, “the agent did the first draft” is the default for routine backend work in AI-forward teams, and the differentiator between developers becomes — fully, finally — what it always secretly was: knowing what to build and how to check it. The terminal won again. It usually does.