How a three-sentence bug report became a tested, documented, deployed release. The human was in the loop for one report and two decisions. The studio did the rest.
The person did the irreplaceable part: noticing something was off, and making two taste calls. Encoded workflows turned that judgment into a correct, shipped fix.
"Augusta is also in Georgia (pop >200k), and Charleston is in South Carolina (>100k). Given these two gigantic misses, we need a thorough check, because I doubt the completeness of our list."
"Keep capitals strict, so relabel it 'Augusta (Capital City).' Good find on the city adds. Make the change, land it on main, bump the version, get it on prod. It is a game fix, not a full release."
That is the whole input. No file paths, no commands, no test runs, no release steps. The eleven steps below ran on encoded studio knowledge.
City populations are exactly where a raw model hallucinates. So instead of guessing, it pulled the authoritative source and computed the answer.
The same source the dataset already cited. 81,375 incorporated-place rows, real populations, zero guesswork.
All 81k rows stayed in the context-mode plugin. Only the 28-row diff entered the model's window. The audit cost almost no context.
Normalized consolidated city-counties (Augusta-Richmond, Macon-Bibb) and caught the New England SUMLEV-061 town quirk the original audit had missed.
The human named the two smallest fish. The audit caught the bigger ones: Columbus GA (202k), Springfield MO & MA (170k / 154k), Concord CA (122k).
The audit produced one genuine judgment call: should a capital that shares a name with a bigger city accept both states, or stay strict? That is a taste question, not a fact. So it stopped and asked.
A complete findings table, a recommended fix, and the one fork that needed a human: strict capitals with a clarifying label, or multi-state accept.
Strict capitals, add the "(Capital City)" label, take the five city adds, ship as a game fix. Roughly ninety seconds of human attention.
Everything before and after this slide was autonomous. The human was consulted once, on the one thing only they should own.
Each one is a piece of studio knowledge, encoded once and reused on every task.
The person noticed something was wrong and made two calls only a human should make. The studio's encoded workflows did the other ninety-five percent: the audit, the correctness, the tests, the release. That is what operational leverage looks like.