I Tried to Stop Prompting My Agents. Here Is What Loops Are Actually Good At.

The new thing in AI coding is agent loops. The pitch is that you stop prompting the AI. You build a loop instead. Something other than you trips a trigger, the agent figures out what needs doing, builds it, reviews it until it is ready, then goes quiet until the trigger fires again. This took off after Boris, the creator of Claude Code, and Peter Steinberger, the creator of Open Claw, both said roughly the same thing: they are not prompting agents anymore, they are designing loops that prompt the agents for them.

I sat on this one for a while, because honestly it sounded like a repackage of the workflow I already run. Draft a plan, tell the agent to build it, check back, iterate. I was mostly wrong, but not in the way the hype wants me to be. Some loops are genuinely great. The "I never prompt anything ever again" version is not. The gap between those two is the whole story, so let me walk through what I found.

The PR loop, the one that just works

The first kind is a loop wired to your pull requests. A new PR comes in and the loop wakes up, or you run it on a cron that goes back through old PRs nobody had time to touch. You can even fold the opening of the PR into the loop itself, so every time you ask for a feature the agent opens a PR, and opening that PR is what kicks the loop off.

The shape I like best is two agents talking to each other. Agent one analyzes the issue, replicates it, writes the fix. When it pushes, that triggers agent two, the reviewer, which picks it up, checks it against what it was supposed to do, makes sure the fix actually works, and hands feedback back to agent one. They go back and forth until the reviewer agrees it is ready, and then the loop closes and merges. With computer use you can push this further. I had the reviewer spin up a dev server, drive the actual feature, and record a video of it working as a hard requirement before the PR could close.

Why I rate this so highly is that it lands exactly where coding agents are strong today. The one thing AI still lacks is taste and a sense of what to build next. It does not lack the ability to grind through boring, well-defined work. So you point it at the pile of minor bugs and forgotten PRs that have been sitting in the backlog for two years because there was always something more important, and it just clears them. That is the dream we actually wanted from AI: get the boring work off my plate.

And the nice part is you no longer hardcode any of this. No bash scripts, none of the old Ralph Wiggum loop scaffolding. I literally opened Claude Code and asked: if I wanted a continuous loop where instead of me prompting you, you find the next high-leverage task, delegate it, review it, iterate, merge, and move on, how would you build that? It designed its own shape and then edited the /loop command that already ships inside Claude Code. It iterated on its own loop. You can do the same in Codex, arguably cleaner, because Codex can run each part of the loop in its own isolated thread, so the reviewer and the implementer live in separate threads and talk to each other, which makes it much easier to see what is going on.

Quick aside, since someone always asks: yes, I am token-maxing my 200-dollar Claude sub for the rest of the month. I paid for it because I wanted Fable, there is no Fable right now, so here we are. We might be back on Codex when it renews.

The spec loop, the one I did not expect to like

The second kind is the most interesting to me. You start with a barely defined idea of what you want to build. You do not hand that straight to an engineer agent. You hand it to a team that argues about it first.

This is the thing I dug into in my last post on one agent versus a team of agents. A team makes the spec materially better, and a better spec reflects straight through into the quality of what gets built. The setup I have been running is a team leader, a tech lead, a designer, and two assistants whose entire job is to be against the spec and find its flaws. I run those two on Gemini 3.1 and Kimi K2.6. They debate, land on a rough outline, then go hunting for execution pitfalls and holes, two rounds of it, and only then write the final spec. You can watch it grow: V1 is short, V2 is longer, V3 is far more detailed and far more thought-through than where it started.

The reason this matters is that LLMs are painfully literal. They treat the spec as religion. If the spec is bad, the output is bad. If you failed to foresee something, it will bite you later. Fable was the exception here. You could half-ass a prompt and it would still get what you meant. In general you do not get that grace. So the loop is not just building. It is using the loop mechanism to figure out what you actually need before a single line gets written, and then an engineer agent takes the spec items one by one, exactly like the PR loop, and implements them.

This is the loop I would reach for to build something from scratch. It is also the one that changed my mind, because it does the thing agents are genuinely good at: taking your sloppy creativity and rendering it into a decently engineered product. I am building a coding-agent GUI right now with this adversarial spec debate baked in, so I clearly bought my own pitch on this one.

The idea loop, the one the hype is really selling

Then there is the version the loud takes are actually about. You say almost nothing, half a prompt at most, and let it run to its heart's content. No human in the loop. This is the "I don't prompt AIs anymore" dream in its purest form, so this is the one I wanted to push until it broke.

I tested it on a small app I built called Future OS. It came out of my one-agent-versus-a-team experiment from the last video, and the original ask was a merger of Notion, Slack, and Dropbox, so an all-in-one workspace with tasks, files, and channels. I modified the /loop skill in Claude Code so it would prompt itself, gave it a broad goal like "make this app production ready," and let its first job be figuring out what that even meant. It had to define the work, write a spec, and iterate until it decided the goal was met.

It ran for about two hours the first time and then the session vanished from my Claude Code history, which was annoying but did not really matter, because the flaw was already obvious. No taste. No direction. No sense of what it actually takes to make a product good.

So I ran it again and steered harder. I told it to create a spec for making this a great product ready to deliver to users and keep iterating until the spec was fully implemented. I was not asking for polish. I wanted to see if it could understand what people want, envision features, do something beyond the obvious. It is genuinely excellent at the front half. It remembered the project, launched a research and spec phase, spawned sub-agents to understand the product deeply. And then it landed on a list that was, top to bottom, polish. I pushed once more and told it explicitly to make the product stand out, be innovative, find ways to be better. It came back with "work stream lens," "decision ledger," "conversation page promotion with provenance object timeline," and a stack of other things that are all just fine-tuning of features that already existed.

It cannot invent a new feature. It cannot point. On a very unfinished product it even logged onboarding as a nice-to-have, slid it onto the list, and made no visible change to the app. It polished the edges of something that was not finished.

And that, weirdly, is the most useful thing the experiment gave me.

What the loop accidentally showed me

Here is what I walked away with.

One, loops need clear direction. Broad prompting does not do the job.

Two, loops are perfect for the routine work nobody enjoys anyway, the bug fixes and the forgotten PR backlogs, and bad at innovation or anything that requires forward thinking.

Three, any task meant to push the product meaningfully forward still needs a human pointing the way. Agents will tirelessly execute whatever you tell them. They are very bad at deciding what is worth telling them.

But notice the flip side of point three, because it is the optimistic part. They are bad at pointing, and very good at taking your half-formed creative direction and turning it into well-engineered reality. That is the whole reason the spec loop won me over.

So when Boris and Peter say they are not prompting agents anymore, I do not think it means what tech Twitter wants it to mean. If you actually watch Boris talk through it, he describes a feedback channel on Slack that the agents monitor, and that feedback is what trips the loops. To me that does not sound like agents driving the product forward. It sounds like a backlog. It sounds like the old pile of PRs nobody had time for, finally getting worked.

And that backlog is enormous in most companies. Matt Burman put out a loop library that makes this concrete: a doc sweep that reviews the whole codebase and updates stale docs and opens a PR, a refactor-until-you-are-happy loop, a page-load loop that just keeps optimizing for speed, a production error sweep that reads your production logs, finds the quiet errors, and fixes them. That last one is the one that hit me. When I was running my startup I rarely had time to even look at production logs. I fixed what was outright on fire and the silent stuff just sat there forever. Point a loop at that and you reclaim a giant slice of work that real teams genuinely never get to.

The MCP feeling

If this all sounds a little familiar, it should. This is the early MCP moment again. The instant MCP showed up, everyone was breathless about using it for absolutely everything, even when it was already pretty clear that CLIs were the better way to let agents talk to most third-party tools. MCP eventually settled into its real, valid niche. I had that same reflex about loops when the noise started, and I was mostly wrong. There is real value here. Just not in the "I never touch a prompt again" framing it gets sold in.

Which brings me back to the take I wrote about earlier this month, that software engineering as we knew it is changing fast and might not be the disaster it sounds like. I will speak for myself: I love programming, but who actually enjoyed the boring bug fixes and the PR backlog? Loops take that off your plate and you still get to drive the product. You just do it in English now, at something like ten times the output. Your skills still matter. They matter differently. Less memorizing syntax, more drawing the experience you want in your head and working backward to the technology with an agent.

Do you need to chase every new fever dream on X? No. You can live without Boris's loops or an Open Claw agent reading your email. Plenty of this is not essential. What is essential is understanding where we actually are, how fast this is moving, and roughly which direction the curve is pointing, because it is not slowing down, politics and the US government suspending Fable aside.

I know it is unsettling to think the skills you built over twenty years are aging out. I think that is the wrong read. Those skills make you the front-runner to become fluent fastest in whatever this turns into. Pieter Levels has a darker version of it, that software is getting commoditized because anyone can vibe-code a replacement for their SaaS subscriptions for free, and the smart money is fleeing into hardware where it is still hard to compete. Maybe. I am not sure cheaper-to-make means everyone suddenly makes it. Nobody really knows yet, me included.

What I do know is that you are holding a priority ticket for this train. Do not refuse to board it.

Alright. Time to go write some more loops.