April 29, 2026·4 min read·The Maqro AI team

Inference got cheap — which AI projects just unlocked

Eighteen months ago, running a frontier-model inference call across every row of a million-row database would have been a finance conversation. The math was straightforward and the math was bad: cents per call times a million calls equals a number nobody wanted to defend in a budget meeting.

Today the same workload costs roughly an order of magnitude less for equivalent capability, and a whole category of work that used to live in the "save it for the annual project" bucket has quietly moved into the "do it on a Tuesday" bucket.

What actually got cheap

Frontier-model inference pricing dropped throughout 2024 and 2025 across almost every vendor. Newer model families ship with better quality at lower prices. Vendor competition pulled the per-token cost down. Smarter routing — using a small fast model for the easy 80% of inputs and reserving the frontier model for the hard 20% — cut effective cost further. Open-source weights running on rented GPUs gave teams a floor under what the closed vendors could charge.

The aggregate effect: a workload that cost \$10,000 to run end-to-end in early 2024 costs roughly \$500 to \$1,500 to run today, depending on how aggressively you route between model tiers.

Which projects just crossed the line

The pattern to watch for is any project that was previously gated on "we can't afford to run AI on every X." If the X is now small enough — every transaction, every email, every document, every meeting transcript, every customer interaction — the project might already be economical and nobody has done the new math.

Three concrete examples:

Full-coverage compliance audit. A compliance program that historically pulled a one-percent quarterly sample because reading every document was infeasible can now read every document. The team we worked with on a two-million-document corpus paid less than the cost of a single annual sample-based audit cycle for the full-corpus pass. The economics of "sample because we have to" are gone for any document set under roughly ten million items.

Every-transaction expense and procurement audit. A finance team that historically reviewed a one-to-three-percent sample of P-Card or expense transactions can now run a policy-classifier on every transaction. The recovery rate scales linearly with coverage, and the dollars recovered on the previously-invisible 97-99% of the population usually pay for the AI run several times over.

Every-email triage and routing. Inbound queues — sales, support, partnerships, legal intake — that previously got triaged by humans because the volume was too high to justify per-message AI inference are now sub-cent per message at frontier quality. The economics of "we'll have a junior person sort it" do not survive a frontier-model triage agent that is faster, more consistent, and roughly 100x cheaper per item.

Every-meeting transcript extraction. Recording every meeting and extracting the action items, decisions, and follow-ups was an experiment in 2023, a polished tool feature in 2024, and at current pricing it is essentially a free utility you can run across the entire organization without a budget conversation.

The trap to avoid

The cheap-inference era invites a specific failure mode: spending the savings on aggressive use cases that were never going to work, instead of unlocking the high-value use cases that were waiting for the price drop.

The right move is not to find new things for AI to do. It is to find the things AI was already the right answer for, but where the unit economics gated adoption. Every-transaction audit, every-document review, every-email triage. The pattern is "we already wanted to do X for everything, but couldn't justify the cost." That is the project that just unlocked.

A pricing-aware rule of thumb

If a workflow involves a finite, enumerable set of items — transactions, documents, emails, conversations, recordings — and your team currently samples or filters because doing the work for every item is too expensive, the inference math has probably changed enough that you should re-run the numbers. The answer might not be "AI on every item" — sometimes sampling is still right for other reasons — but the cost-per-item has fallen far enough that the question is now genuinely open.

That re-evaluation is the easy half of the work. The hard half is the same as it always was: building the system that turns frontier-model output into something a team can actually act on.

Inference got cheap — which AI projects just unlocked

What actually got cheap

Which projects just crossed the line

The trap to avoid

A pricing-aware rule of thumb

More from the team

The maintenance is the work — why AI is the first system that drifts under you

The 1M-context window changes the shape of audit work

The gap between AI capability and your business — and who closes it

Ready to close your own gap?