Your Software Factory Should Build Itself
There is a live debate about how much code to wrap around a model. The thin camp says almost none: give the model a goal and a stop condition, put the intelligence in skills in plain English, and let it run. The loudest version is Garry Tan's "Stop building Foxconn factories for your agents", which looks back at the half-million lines he once wrapped around an LLM and calls them a cage to tear down. The thick camp builds software factories: deterministic code that walks every change through fixed stations.
We do not have a horse in this race. Actual AI builds an architecture agent that manages architectural context for coding agents and software factories, and it works the same with a thin harness or a fat one. But watching from the middle, we think both camps are arguing about the wrong thing, and we will make a claim that annoys each of them equally: the factory should build itself. Here is the argument, one step at a time.
One machine, two settings
The loop is the thin answer. Write a goal and a stop condition in plain English, "ship the feature, stop when the auth tests pass and lint is clean," hand them to a small driver, and let it call the model until the condition is true. Boris Cherny, who leads Claude Code, put it bluntly: "I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops." Addy Osmani calls the discipline loop engineering: you stop being the person who prompts the agent and design the system that does it instead.
The factory is the thick answer. Deterministic code runs the stations and calls the model at each one; the control flow is a program, not a suggestion. But the stations are not dumb. At each one the factory hands the work to an agentic loop that decides in the moment how to hit that station's goal. Loops on the inside, rails on the outside. That is actually how most organizations run the human-centered SDLC today.
Push a loop far enough and the two ends converge. A serious loop does not call the model once and stop, it plans, implements, reviews, ships. Garry Tan's gstack proves it from the thin side: a kit of skills written in English, office hours through planning, review, QA, shipping, and a retro, which is to say stations, with a human as the conveyor belt. The software factory we run internally proves it from the thick side, with stations for planning, design, architecture, assertions, code, review, deployment, and continual improvement, the assertions written before the code they will judge. Same machine. The only setting that differs is who moves the work between stations: a human, a thin driver, or code.
Half the harness was a cage. The other half is the point.
The thin camp calls every line of the harness "an inch of cage bolted onto a worker who can already do the job." Half right, because two kinds of code hide under that one word. The first is distrust of the model's intelligence: sanitizers for inputs it would have handled, validators for outputs it would have caught, retries around calls it recovers from on its own. The models got good, that code went net negative, cut it all.
The second kind is determinism: the gate every change passes in the same order, verification run by something other than the agent that wrote the code, the router that sends a failed test back to the right station, the rollback, the audit trail. That code is not a bet that the worker will fail. A model that is right 99% of the time, called ten thousand times a day, is wrong a hundred times a day, and the harness is what turns usually into always for the things that have to be always.
Code got cheap. The argument did not survive.
Now notice what both camps are doing: measuring the harness in lines of code. That was the right ruler when code was expensive to write and expensive to keep. Both camps agree it no longer is, the model writes it. And once code is cheap, the cost of a harness was never the lines, it is keeping them true while the models change every quarter and your idea of good changes every week. The cage was never code. It was frozen code.
So the question moves up a level, from how much code to who maintains it. And the answer is not you. Humans know they should use calculators. Models know they should be coding deterministic processes. We should let them.
The factory that builds the factory
Put the three steps together, same machine, determinism worth keeping, models writing the deterministic parts, and you get the claim we started with. Start with any harness you can run end to end, thin is fine. Every run produces evidence about the process itself: where the model wandered, which gate caught a real mistake, which skill read two ways, which step burned tokens re-deriving something that never changes. The last station on the line, continual improvement, reads that evidence and edits the factory: rewrites the ambiguous skill, hardens the never-changing step into deterministic code, retires gates that have stopped earning their keep, and re-runs its evals when a new model arrives to shift work between English and code. Every one of those edits ships through the factory's own gates, because the factory that ships your software is itself software the factory ships.
That is the factory that builds the factories, and everyone building with agents will eventually need one. How you bootstrap it is a story of its own. Marc Andreessen can aim for zero introspection in his founders. We believe coding agents should be introspecting all the time.
Actual AI builds an architecture agent that manages architectural context for coding agents and software factories, so your harness keeps pace with your codebase instead of drifting out of date.