More Turns, Bigger Bill: Why Agentic Architecture Is a Token Multiplier Independent of Price
Published: June 2026 | Author: David Daniel
Companion to the paper "When the Loop Never Stops" (Paper 2). This article isolates one argument from the paper's token-multiplier section and sharpens it to a single empirical point: agentic workflows multiply token consumption through turns, not just price. The paper situates this inside the larger always-on-agent cost reckoning; this piece exists to make the architecture-as-cost mechanism impossible to wave away.
A note on method before the argument: this piece mixes three kinds of claims, and I've tried to label which is which as they appear. Some things are directly reported by a source. Some things are arithmetic derived from source numbers. And some things are my analysis: inferences the sources don't make themselves. Where a figure is a vendor's self-report or a mathematical illustration rather than an observed measurement, it says so inline.
The bill went up, but the price didn't have to
The common explanation for the 2026 agent-cost spike is that tokens got more expensive. That story is incomplete, and the incompleteness matters because it points people at the wrong fix. If the problem were price, the remedy would be obvious: route to a cheaper model, negotiate volume discounts, wait for the deflation curve that has reliably pushed per-token rates down year over year. Teams are doing all three and watching their bills climb anyway.
The variable those remedies don't touch is volume: specifically, how many tokens an autonomous workflow burns to finish a task. A single-shot completion is one prompt and one response. An agent that plans, calls tools, reads the results, re-plans, and repeats across dozens of turns consumes a multiple of that, because every turn re-submits context and generates new output, and every tool result becomes input to the next turn. The per-token rate could stay flat (could even fall) and the bill for the same nominal task would still rise with the turn count.
That is the entire argument of this piece, stated up front: agentic architecture is a token multiplier, and the multiplier operates independently of price. What follows is the public evidence: a billing-model collapse on the supply side, a benchmark decomposition that separates the price effect from the turn effect, a price-deflation datapoint that shows why cheap tokens alone don't settle a budget, and a demand-side budget burn that shows the multiplier landing in production.
GitHub's billing change: the supply side names the problem
The clearest market admission comes from GitHub. On April 27, 2026, it announced that Copilot was moving from flat-fee "premium request" units to usage-based token billing, effective June 1. The stated reason was not that GitHub's input costs per token had risen. It was that the flat unit no longer mapped to what users actually consume. In GitHub's words: "a quick chat question and a multi-hour autonomous coding session can cost the user the same amount... the current premium request model is no longer sustainable."
Read that sentence as a structural diagnosis. A billing unit designed for the chat era priced a question and an autonomous session identically, and the spread in underlying consumption between those two things grew large enough to break the unit. GitHub does not quantify the spread, and I won't put a number on it for them, but a pricing model does not get publicly abandoned over a small one. The thing that widened the spread is the architecture: multi-hour autonomous sessions exist now, chat-length interactions still exist, and one flat fee cannot price both.
The reaction confirmed who the change was really about. In the days between the announcement and the June 1 effective date, developer backlash built; TechCrunch's May 30 coverage led with one developer's verdict, "what a joke". In that pre-launch coverage, the projected increases were dramatic, and they are worth quoting at the level of the actual examples rather than a smoothed range. One developer quoted by TechCrunch paid around $29 a month and projected nearly $750 under the new rates, roughly a 26x jump (that multiple is derived arithmetic on the reported dollar figures, not a number TechCrunch states). Another shared a screenshot that appeared to show costs going from around $50 to some $3,000 (roughly 60x, same caveat). Three caveats belong next to those numbers. First, they are user-reported estimates of what existing usage would cost under the announced rates, published before the billing change took effect, not audited post-launch bills. Second, they are anecdotes from the hardest-hit users, not a measured distribution. A broader "10x to 50x" range circulated in commentary around the change, but that range is an extrapolation from anecdotes of exactly this kind, and this article does not rely on it. Third, the coverage itself records pushback: other users argued the extreme bills reflected iteration-heavy "vibe coding" rather than typical workloads. But the direction is exactly what the multiplier argument predicts, and the discussion TechCrunch quotes draws the line to architecture itself, with one user describing premium requests "that could churn for hours or even days while spawning dozens or even hundreds of sub-agents." Light chat users barely move. Turn-heavy users move by multiples. Same product, same models, same nominal rates: different architecture of use.
The decomposition: where 5.5x comes from
The most useful single datapoint is one that separates the price effect from the volume effect, because the two are usually tangled together in any real bill. Independent benchmarking outlet Artificial Analysis provides it. In its analysis of Google's model line, it found that Gemini 3.5 Flash costs roughly 5.5x more to run than Gemini 3 Flash across its evaluation suite and, crucially, decomposed why.
Part of the increase is price: per-token rates rose from $0.50/$3.00 to $1.50/$9.00 per million input/output tokens, a 3x increase (that ratio is simple arithmetic on the published prices). But 3x in price does not produce 5.5x in cost on its own. The remainder comes from volume. Artificial Analysis reports that input token usage "increases significantly, driven primarily by an increase in the number of turns in agentic evaluations." The newer model takes more turns per evaluation task, each turn carries context, and the extra turns multiply the token count on top of the higher rate.
One framing caveat, because precision matters here: this is a benchmark-suite decomposition, not a controlled experiment. Artificial Analysis ran both models through its evaluation suite and attributed the cost gap to its two observed components; it did not hold every variable fixed and vary turns in isolation. The decomposition into "about 3x from price, the remainder from token volume driven by turns" is the source's reported finding plus arithmetic on its published numbers, and it is the cleanest public separation of the two effects I'm aware of.
What it shows is the multiplier operating in the open. The "just route to a cheaper model" reflex assumes the bill is a price problem you can shop away. But if a meaningful share of cost growth is turn count, then a cheaper model that takes the same number of turns saves you less than the price gap suggests, and a more capable model that takes more turns can cost more in total even at a lower headline rate. Price and volume multiply. Optimizing one factor while the other grows is how teams end up surprised.
Why price deflation alone doesn't settle the budget
Per-token prices are, in fact, falling hard on one end of the market, which makes the rising-bill phenomenon more puzzling under the price story and entirely unsurprising under the turns story. The sharpest public datapoint: DeepSeek made its 75% discount permanent on V4 Pro, pricing output tokens at roughly $0.87 per million. The Decoder frames that as at least 34x below GPT-5.5's roughly $30 per million output tokens. To be precise about what that figure is: it is an output-token price gap between two specific models, not a measured total-cost-of-workload comparison, and not a recommendation to switch.
A 34x per-token spread is enormous, and it is real. But it sits on only one side of a multiplication. Total cost is rate times volume, and a workflow's turn structure sets the volume. To make the interaction concrete with deliberately round numbers: a token priced 34x cheaper but consumed 50x more often yields a higher bill, not a lower one. That 50x is an illustrative figure (mathematical illustration, not an observed DeepSeek workload statistic), but the point it illustrates is not hypothetical. The Gemini decomposition above shows turn-driven volume growth large enough to outrun a 3x price change within a single vendor's own model line. Nothing in the arithmetic stops volume growth from outrunning bigger spreads when the workload is agentic enough.
This is the precise failure mode of price-shopping as a complete strategy. It treats the rate card as the budget. The budget is the rate card times everything your architecture does with it.
A second structural feature of the rate card belongs next to the deflation point: providers bill input and output separately, and output carries a steep per-token premium, typically three to five times the input rate. On Anthropic's published price sheet (rates as listed in June 2026), every current tier prices output at five times input, including Claude Fable 5, the premium agentic-tier model Anthropic shipped on June 9, 2026, which lists at $10 per million input tokens and $50 per million output (and at exactly twice the corresponding Opus 4.8 rates; both multiples are arithmetic on the same published sheet, not a finding). The DeepSeek gap quoted above is specifically an output-token price gap. None of this changes the argument; it compounds it. The "price" of an agentic workload is at least two rates multiplied by two volumes, with both volumes set by the loop's turn structure, which is one more reason no single cheaper headline number can settle a budget on its own.
The demand side: Uber's four-month burn
The supply side changed its billing; the demand side blew through its budgets. Fortune reported in late May that Uber ran through the budget it had set aside for AI coding tools for all of 2026 in roughly four months, with Claude Code among the tools in heavy use (Fortune names Claude Code specifically; a Yahoo Finance syndication of Bloomberg and The Information reporting names Cursor alongside it). No specific dollar figure was disclosed, and I won't supply one. What's on the record is the timeline: a budget sized for a year, gone by spring.
Just as notable is what Uber's COO, Andrew Macdonald, said about the other side of the ledger. On whether token spend connects to shipped consumer value: "That link is not there yet." He added that it is "very hard to draw a line" between AI usage statistics and the roughly 25% more useful features the company wants to ship. That is a demand-side executive describing both halves of the problem at once: consumption that outran the plan, and value attribution that hasn't caught up with the consumption.
A boundary needs drawing here, and I want to draw it explicitly. Fortune reports the burn, the tools involved, and the COO's skepticism. Fortune does not attribute the burn to turn multiplication. The example did travel beyond Fortune: CNBC's May 29 piece, "Tokens or humans? The new corporate trade-off", carried the four-month burn into its coverage of the broader corporate tokens-versus-headcount trade-off. (A verification note on that citation: the CNBC page resisted automated retrieval during source checks for this article, so it is cited here only for the pickup; its existence, date, framing, and use of the Uber example were confirmed via search indexing and secondary coverage, and no mechanism claim is rested on it.) The mechanism reading is mine: autonomous coding agents are the most turn-intensive workloads in common enterprise use, Uber's named tools are agentic coding tools, and a budget presumably sized on earlier, less agentic usage patterns getting consumed in a third of the expected time is what the turns multiplier predicts a budget would do. That is analysis consistent with the reporting, not a fact the reporting establishes. What the sourced record does establish is enough for this article's purposes: a sophisticated buyer, using agentic tools heavily, consumed a year of budget in four months, and its leadership describes the spend-to-value link as not yet demonstrated.
Where the turns come from: cadence and horizon
If turns are the multiplier, the operative question becomes what governs the number of turns. Two mechanisms are worth naming: one I can only argue qualitatively, one with concrete (if caveated) public numbers.
Cadence. Long-running and multi-agent setups have a rhythm: how often agents poll for new work, heartbeat to signal liveness, check shared state, or re-read context to stay current. Every one of those cycles is at least one turn, every turn re-submits context as billable input, and an always-on agent ticks around the clock whether or not anything useful happens on a given tick. I want to be straightforward about the evidentiary status here: no public primary source quantifies cadence's share of agent spend, and an earlier circulated claim that cadence is "the top lever" traced back to a source with an arithmetic error, so I am deliberately carrying no numbers and no ranking. What the public record does support is the mechanism's economics. Anthropic's prompt-caching pricing charges cache reads at a fraction of the fresh-input rate precisely because re-submitted context is the recurring cost in multi-turn workloads; caching exists to blunt the cost of paying for the same context again and again. The inference I draw from that (that in an always-on agent, the frequency of context re-reads is a first-order spend variable, plausibly mattering more than the size of any single context) is my analysis of the pricing structure, not a sourced ranking. Treat the direction as sound and the magnitude as unestablished.
Horizon. The second mechanism is how long the loop runs before a task is done, and here public long-horizon numbers give a sense of scale, with caveats attached. Artificial Analysis reports that evaluating Qwen3.7 Max on its full Intelligence Index generated about 97 million tokens at a cost of $1,202. That figure is the total across the entire evaluation suite (many tasks, not one autonomous run), so read it as a measure of what sustained agentic-style workloads accumulate to, not the price of a single task. And in a vendor self-reported demo (not independently audited), MiniMax describes its M3 model running roughly 24 hours on a CUDA-kernel task, completing 147 benchmark submissions and 1,959 tool calls. Neither figure proves the thesis on its own (one is a benchmark aggregate, the other a vendor showcase), which is why neither is load-bearing here. They earn their place as illustrations of shape: when the horizon extends, one "task" decomposes into hundreds or thousands of billable steps, and every step is turns. The multiplier isn't an abstraction at that scale. It's the line items.
Cadence sets how often the loop fires; horizon sets how long it keeps firing. Together they determine the turn count, and the turn count, multiplied by the rate card, is the bill.
What this argument does and doesn't claim
Because the surrounding commentary on AI costs is noisy, it's worth being exact about the boundaries of the claim.
This article does not claim that per-token prices are irrelevant. The Gemini decomposition itself attributes about 3x of the 5.5x to price. Rates matter; they're a factor in the multiplication.
It does not claim a universal multiplier. How many extra turns agentic workflows take varies by task, model, scaffold, and configuration, and the public record doesn't support a general number. The Gemini case quantifies one model transition on one evaluation suite; the GitHub and Uber episodes show the phenomenon's footprint without measuring its size.
It does not claim the turns are waste. The companion paper's whole subject is that some loops earn their tokens and some don't; this piece is agnostic on the value question and narrow on the cost mechanism.
What it does claim is this: token consumption per task is an architectural property, set by turn count, and it scales independently of the rate card. Any cost analysis that models price and ignores turns will misprice agentic work, and any cost strategy that consists entirely of routing to cheaper models is optimizing one factor of a two-factor product.
The takeaway, handed back to the paper
The single point of this companion is narrow on purpose: agentic architecture is a token multiplier independent of price. GitHub's billing change names the spread between a chat question and an autonomous session as the thing that broke flat-fee pricing. The Gemini 3.5 Flash decomposition separates the factors: roughly 3x from price, the rest of 5.5x from token volume driven by more agentic turns. DeepSeek's 34x output-token price gap shows that even dramatic price deflation settles nothing by itself, because volume sits on the other side of the multiplication. And Uber's four-month budget burn shows what it looks like when turn-heavy workloads meet budgets sized for a less agentic era, with the caveat, marked above, that the mechanism reading there is analysis, not reportage.
If the diagnosis is right, the fix is not only cheaper tokens. It is governing turns: cadence, horizons, loop termination, and, above all, whether each cycle of the loop produces enough value to justify the tokens it consumed. That governance question, measuring whether the loop paid for itself and deciding what to do when it doesn't, is where "When the Loop Never Stops" (Paper 2) takes over. This piece exists to make its token-multiplier premise unambiguous before the paper builds on it.
Companion to: "When the Loop Never Stops" (Paper 2), token-multiplier section.
This article is part of an ongoing research project tracking AI tooling, software engineering practices, and cross-functional workflows at daviddaniel.tech/research.
This article was created with AI assistance. Sources include: GitHub Blog billing announcement (April 2026), TechCrunch developer-reaction coverage (May 2026), Artificial Analysis Gemini 3.5 Flash analysis and Qwen3.7 Max evaluation data, Fortune and CNBC reporting on Uber's AI tooling spend (May 2026), The Decoder on DeepSeek V4 Pro pricing, Anthropic prompt-caching documentation, Anthropic's published model price sheet and Claude Fable 5 announcement (June 9, 2026), and MiniMax's M3 model page (vendor self-reported). Data as of mid-June 2026. Illustrative arithmetic and analytical inferences are labeled as such inline.