The Intelligence Toll
LLeconoMics
The Intelligence Toll
I’ve been writing about this for about six months. Last year I published two essays arguing that the AI industry’s extraordinary generosity (near-unlimited access to frontier models for $20 a month, or even for free) was structurally unsustainable, and that when the subsidies ended, what would emerge in their place would look a lot like every other form of inequality we already know. Intelligence, I argued, was on its way to becoming a luxury good.
I didn’t want to be right this fast.
What I’m going to do in this piece is walk through five specific things that happened between December 2025 and right now, today, March 31st, 2026, and show you that we are no longer in the prediction business. We are in the documentation business. The reckoning arrived, it arrived ahead of schedule, and its shape is exactly as uncomfortable as both essays warned it would be.
The Five Events
1. Google’s “Accidental” Free Tier
On December 7, 2025, Google quietly slashed Gemini API free tier quotas by 50 to 92%. Developers around the world woke up to cascading 429 “quota exceeded” errors with no advance warning. Four cuts followed in four months: the December quota massacre, a February image quota tightening, a March 2026 shift to a credit system, and throttling that now reaches even paid Ultra accounts.
The detail that sticks with me isn’t the size of the cut. It’s the explanation. Google’s own product lead admitted that the 2.5 Pro free tier “was originally only supposed to be available for a single weekend” and had “accidentally” lingered for several months. He framed the cutoff as correcting an oversight.
Read that sentence again. Frontier AI access was never supposed to be free for more than a weekend. The months of generous access were not a gift. They were an error. The moment someone noticed the error, it was corrected.
This is the clearest window into how these companies actually think about free access: a controlled experiment that accidentally escaped the lab.
2. Anthropic’s Impossible Position
This one is harder to write about because it involves a company doing something genuinely admirable and getting punished for it by physics.
In late February 2026, OpenAI signed a contract with the Pentagon to apply its models to lethal autonomous operations and mass surveillance of American citizens. Anthropic refused to do the same. The response from users was immediate and dramatic: ChatGPT uninstalls spiked 295% in a single day, and a movement called QuitGPT claimed 2.5 million participants. A significant portion of those users came to Claude.
On paper, this is everything a mission-driven AI company could want. Anthropic held a principled line, and the market rewarded it. Annualized revenue reportedly hit $19 billion by March 2026, and subscriber counts surged.
The problem is that subscribers are not the same thing as GPUs. You cannot will compute infrastructure into existence. GPU lead times are approaching a year. And so Anthropic found itself in a situation that no amount of good values could fix: more demand than GPU capacity. The company that did the right thing broke under the weight of the reward.
Starting late March 2026, Anthropic began tightening usage limits specifically during weekday peak hours, 8am to 2pm Eastern. Sessions on Claude Code now burn faster by design during those windows. And today, the day I’m writing this, The Register’s headline reads: “Anthropic admits Claude Code quotas running out too fast.”
People who left OpenAI for their Pentagon partnership sought refuge with Anthropic, and now they’re being throttled. There is no safe harbor. The physics don’t care about your principles.
3. Codex: The Land Grab Made Explicit
The language has changed, and that matters.
In the original era of AI subscription plans, companies buried the unsustainable math in words like “unlimited.” You weren’t told what “unlimited” actually meant or how it was being financed. The fiction was comfortable for everyone: users felt like they were getting a great deal, and companies got to avoid talking about the billions in losses being absorbed by investors hoping for a future payoff.
OpenAI’s new Codex coding agent app dropped that fiction. The access that Free and Go users received was described, plainly, as being available “for a limited time.” The doubled rate limits for paid plans? Also for a limited time. The end date for that access is April 2nd. Two days from now.
I want to be precise about why this matters beyond the inconvenience. “Limited time” as marketing copy is an acknowledgment that the access you’re receiving is being subsidized for a strategic purpose: market capture. The generosity is a price of entry, paid by investors, intended to lock in behavior and habits before the real pricing arrives. It has always worked this way. What’s new is that they’re now saying so out loud, with a specific expiration date, while you’re still using the product.
4. GitHub Copilot: When You Can’t Charge Enough, You Take the Data
If you haven’t seen this one yet, pay close attention.
On March 25, 2026, GitHub updated its privacy policy. Effective April 24, 2026, interaction data from Copilot Free, Pro, and Pro+ users will be used to train AI models, by default. You have to actively opt out. If you don’t, your code completions, accepted suggestions, rejected suggestions, file names, repository structure, and everything else Copilot sees during a session goes into the training pipeline.
Business and Enterprise accounts are excluded from this, because those users pay enough that the company doesn’t need an alternative form of payment. The individual developers on cheaper plans do not pay enough. So instead of charging them more, GitHub is charging them in data.
This is the social media business model, applied to your code. Facebook never charged you a subscription fee. It charged advertisers for your attention, harvested from everything you posted and clicked. GitHub cannot get individual developers to pay what Copilot actually costs. So it is now accepting code as currency.
The technical caveat (that private repo code “at rest” isn’t trained on) doesn’t fully hold in practice. Code from private repositories that is actively used in a Copilot session is interaction data, which falls into the training pipeline unless you’ve opted out. The distinction between “stored” and “in use” is a meaningful one that most users will not read carefully enough to understand.
If you use GitHub Copilot on any individual plan, go to /settings/copilot/features and check your opt-out status before April 24th.
5. Cursor Builds Its Way Out of the Trap
The final piece is a company story rather than a restriction story, but it illuminates the same underlying pressure from a different angle.
Cursor, the AI-powered coding editor, has been one of the more interesting players in the space because it was built on top of other companies’ models, primarily Anthropic’s. That dependency created a ceiling. Claude Code reportedly costs Anthropic approximately $5,000 per user per month in compute against a $200/month subscription price. Cursor, reselling that API at competitive prices, could not reproduce those margins. It could not compete with a product being subsidized at roughly 25:1.
So in March 2026, Cursor released Composer 2, its second proprietary AI model. Built largely on Kimi K2.5, a Chinese open-source base model, with significant fine-tuning, Composer 2 runs at $0.50 per million input tokens compared to Claude Opus 4.6 at $5.00, a 10x cost reduction. On Terminal-Bench 2.0, it scores 61.7% against Claude Opus 4.6’s 58.0%.
The lesson here is that the only way to survive in an ecosystem where two or three well-capitalized players are subsidizing access at billions per quarter is to stop being dependent on them. Vertical integration, in this context, is a survival strategy. And Cursor found its foundation in Chinese open-source, because that is where the cost-competitive base models currently live.
Every player of sufficient size will eventually arrive at the same conclusion: own the stack or be squeezed out by someone else’s investor-subsidized pricing.
The Physics Underneath
These five events did not happen because of boardroom decisions or competitive strategy alone. They happened because of physics.
GPU lead times are now approaching a year. High-bandwidth memory is sold out through 2026. Microsoft’s AI CEO Mustafa Suleyman has said plainly that inference compute scarcity, not the smartest model, will determine who wins over the next two to three years. You can have the best researchers and the most elegant architecture. If you cannot serve inferences at scale, you lose.
The transition from chatbots to agentic AI compounds this dramatically. A chatbot needs one to five API calls per session. An agentic coding tool like Claude Code, Cursor, or Cline needs ten to five hundred calls per task, sometimes more. The same physical infrastructure supports far fewer users when each user is running autonomous agents rather than asking questions. That cost is structural, built into the nature of the task. But it means the economics of the previous era simply do not apply to the tools that are now most useful.
The subsidies were absorbing roughly 90% of the actual infrastructure cost of every token processed. That gap cannot close through engineering alone when the hardware necessary to close it has a year-long queue.
Compute Is the Currency
Sam Altman said it on the Lex Fridman podcast, a couple of years before the scarcity set in: “Compute is going to be the currency of the future.” He said it as aspiration, as a vision for what he was building toward. He was right. But the person building the compute monopoly has a different relationship to that sentence than the person being priced out of access to it.
Jensen Huang made the same point from a different angle at GTC 2026, on the All-In Podcast. He argued that a software engineer earning $500,000 a year should be consuming at least $250,000 worth of tokens annually, and that if they aren’t, something has gone wrong. He proposed that Nvidia would give engineers a token budget worth roughly half their salary on top of their pay, as a productivity multiplier. With around 42,000 employees, Nvidia is targeting roughly $2 billion a year in token spend across its workforce.
That is an extraordinary sentence to sit with. The CEO of the company that makes the hardware AI runs on is now proposing that token consumption should be a line item in compensation packages, scaled to salary. Compute, in his framework, is already a form of pay. The question is who gets the allocation.
We have been here before, just with different scarce resources.
Money concentrates power. We understand this so well it barely needs saying. Capital compounds: those who have it can buy productivity, access, and opportunity, and those returns generate more capital. The people who own the infrastructure (the land, the factory, the platform) extract rent from everyone who needs it. The gap between the top and everyone else widens over time, because the mechanisms that produce wealth also produce its concentration.
What’s happening with compute follows the same structure, with one crucial difference: the resource being concentrated is intelligence.
Think through what access to frontier AI tools actually does to a person’s economic output. A developer with agentic coding assistance has the effective throughput of a small team. A researcher with full model access covers literature review, synthesis, and hypothesis generation in hours instead of weeks. A student with strong AI tools compresses years of skill development. An employee at a company with enterprise AI access outperforms a peer at a company without it, in ways that compound across months and years.
This is a literal productivity multiplier. And productivity multipliers are economic multipliers. The people already at the top of the wealth distribution (enterprises, well-funded startups, workers in high-income markets) can absorb $200 to $2,000 per month in AI costs without meaningful disruption. The people at the bottom (freelancers, students, small businesses in developing economies, people already stretched by cost of living) cannot.
When Cursor quietly maintained a “slow pool” for heavy users, sometimes adding 20-minute delays even after periods of non-use, the name they chose is telling. The slow pool is where you end up when you need the tools more than you can pay for them. The fast pool is for everyone else.
The Compounding Problem
What makes wealth inequality so durable is the compounding of small advantages over long periods, not dramatic theft. A consistent 5% productivity edge, maintained over a decade, produces a gap that is effectively structural. The mechanisms of wealth reproduction are quiet, distributed, and cumulative.
The same mechanism applies here. Consider two people entering the same field in 2026: one with access to frontier AI tools, one without, or with significantly degraded access. Over a year, the gap in their outputs is meaningful. Over five years, it is career-defining. Over a generation, it becomes the kind of gap we usually explain with references to talent, work ethic, or luck, when really the explanation is access to the tools that make effort multiply.
The historical pattern is consistent. The printing press democratized access to written knowledge, and then the economics of publishing concentrated it again into the hands of those who could own presses. The internet democratized access to information, and then advertising and algorithmic amplification reconcentrated attention, influence, and ultimately economic power into the hands of those who owned the platforms. Each wave of democratization gave way to reconcentration. AI is running this cycle in approximately five years instead of fifty.
What the 2025 AI Divide analysis documents is already visible at the macro level: AI’s transformative benefits are concentrating among a small number of technology companies and wealthy economies, and the gap is growing between organizations and individuals with access to advanced AI capabilities and those without. The concentrated nature of AI capacity (in data, compute, and expertise) amplifies existing inequalities rather than correcting them.
What You Can Actually Do
I am not going to pretend that individual action solves a structural problem. It doesn’t. But there are things worth doing in the meantime.
The local model trajectory is real, just slower than hoped. The promise I wrote about in the first LLMeconomics essay (consumer hardware capable of running capable models) has partially arrived. The gap between what runs locally and what frontier providers offer is still significant for complex agentic tasks, but it is narrowing. Cursor building on Kimi K2.5 is evidence that the open-source ecosystem is competitive in ways it was not a year ago. Support that ecosystem. Use it where you can.
Your data is still your most durable asset. Collect it, organize it, store it in open formats. The advice from essay one stands and is now more urgent, with one addition: watch where your data goes. The extraction phase has begun, and unlike the throttling, it arrives quietly with the default set against you. If you use GitHub Copilot on a personal plan, opt out before April 24th.
Understand the tier you’re in. The free tier is becoming slower and more restricted by design. That is not going to reverse. Plan around it. If AI tools are central to your work, treating that access as guaranteed at current prices or current quality levels is a mistake that will cost you more to correct later.
The policy window is short. Public compute programs exist: the EU’s EuroHPC AI Factories, the India AI Mission, the US National AI Research Resource. They are tiny relative to the scale of private infrastructure, and they are being built out as the private sector races ahead. The questions we should have been asking two years ago about public access to AI infrastructure are only now being asked. They’re not too late, but they’re getting there.
The Part I Find Hardest to Write
The version of this story I would prefer to be telling goes like this: the open-source ecosystem produces models good enough to run locally on consumer hardware, the edge computing buildout reaches the point where you genuinely own your AI, and the concentration of intelligence at the top of the wealth distribution turns out to be a temporary phase rather than a permanent condition.
That story might still be true. The asymmetry is real: you cannot run a copy of a billion-dollar hedge fund on your laptop, but you can run a copy of a well-trained model. That asymmetry is the best structural reason to be invested in what the open-source community is building.
But I have now been watching this industry for long enough to have written two essays predicting what would happen and then watched it happen faster than I predicted. The window in which the choices made by companies, regulators, and users can meaningfully shape the outcome of intelligence stratification is a real window, and it is not open indefinitely.
Sam Altman was right: compute is becoming the currency of the future. The question left to answer is whether that future will reproduce, at enormous speed, the same inequality that every previous form of concentrated currency has produced, or whether the specific properties of intelligence as a resource allow for a genuinely different outcome.
I don’t know the answer. But I am paying closer attention to who controls the compute than I am to who has the best model. Because in 2026, those are not the same question, and only one of them determines what kind of future we’re in.
This is the third essay in the LLeconoMics series. The first, LLeconoMics, covered the API pricing economics and the unsustainable subsidy model. Striated Intelligence, made the case that compute was becoming class. This one documents what it looks like when the prediction arrives.


