Key Takeaways

  • Enterprises face a growing challenge: enabling widespread adoption of artificial intelligence and embedding AI into daily operations and complex workflows, while visibility is low and the costs are spiraling out of control.
  • Use cases for AI have moved from simple generation to advanced use cases, like code generation and agents, increasing spending compared to earlier and simpler applications.
  • AI gateways, like Amplify AI Gateway, provide visibility and centralized control, where teams can manage consumption, establish granular quotas, and enforce policies that transform AI from unpredictable spending into a scalable solution with long-term sustainability.

Enterprises that are embracing artificial intelligence are dealing with a dilemma: how to enable AI-driven innovation for their teams at reasonable costs.  

For the last few weeks, stories of skyrocketing AI costs have hit the news. The Verge reported that Microsoft is canceling most of its Claude Code licenses and has started transitioning to its native Copilot CLI. The Financial Times reported that Amazon canceled its AI usage leaderboard, an initiative started to promote AI adoption, citing improper use of artificial intelligence; according to the FT’s source, Amazon’s senior executive Dave Treadwell told staff: “Don’t use AI just for the sake of using AI”.  

Axway’s Chief Product Officer, Meetesh Patel, also emphasized a similar trend with an example:  

“Uber’s CTO recently posted that he spent his entire IT budget, $26 million, in Q1 on AI compute. Right now, he has to figure out how he’s going to fund the rest of the year because engineers can’t stop working.”  

I have had several conversations on AI cost controls with quite a few of Axway’s customers, as they have been sharing with me in recent weeks that AI usage costs have started spiraling out of control.  

In this blog, I cover why costs spiral out of control, how AI governance saves budgets, and how Amplify AI Gateway can help.  

Why are AI costs spiraling out of control?  

Artificial intelligence evolves fast. Within the scope of two years, we moved from generating simple copy to generating complex imagery, and now we’re in the world of complex code generation and agents. The use cases for AI have quickly shifted from sporadic and simple queries to complex multi-agent development workflows.  

AI models run on tokens, and pricing depends on token consumption: simpler tasks consume few tokens, complex workloads bring a massive cost multiplier, and developer use cases where entire codebases are managed by AI models consume thousands of times more tokens than a simple request. Teams have moved on from cautious and experimental use cases, like debugging, to building their entire codebases with AI tools. TechCrunch reported in February 2026 that developers of Spotify, a popular music and podcast streaming service, haven’t written a line of code since December 2025 after delegating to AI.  

Aside from dev tools, organizations continue to find stable everyday use cases for artificial intelligence. Many websites had a chatbot before AI was introduced. Shifting legacy chatbots to LLM-powered bots was an obvious next step, but such continuous use of AI inevitably raises day-to-day costs. AI is becoming a foundation for stakeholder interaction on new developments. For many organizations, AI has become an integral part of their automation infrastructure.  

And this is only the beginning. The next logical step in the mass adoption of AI is deploying agents, and while only very few organizations have autonomous agents right now, it’s only a matter of time before agentic AI becomes everyday reality as well. In the future, we could use it for creating automated testing, automated deployment, or automated reviews of the code. These steps will inevitably raise the costs even more.  

The innovation itself is incredibly rapid: 

“Look at what happened since last year. Before last summer, coding agents were okay. They weren’t the best. Then, come into July–September: welcome to Claude Code. That changed the world overnight. It changed how you do things overnight. That’s the speed at which things are moving.” 

— Meetesh Patel, Axway’s CPO 

But the diffusion of this innovation remains uneven: some organizations and teams are moving faster, some are falling behind. Moving faster doesn’t always mean moving better, and major enterprises now direct their staff to focus on building better products and producing value with AI rather than just using AI for the sake of using AI.  

Companies don’t have an AI cost problem. The problem is governance of AI, its usage, and its cost.  

The task that enterprises face today is finding a delicate balance between costs and the value that these costs produce. A single developer can rack up thousands of dollars in AI token usage within one week — all while producing a valuable product that would take an entire team and months of development. Consuming large amounts of tokens can be justified if productivity outweighs spending. In that case, the problem shifts from minimizing costs to maximizing ROI per token spent.  

How good AI governance looks 

Social media and AI-centric online communities recently discussed the validity of a story about an anonymous company that allegedly forgot to set any spending limits on its AI usage and faced a $500 million token bill at the end of the first month. Regardless of how true this story is, it teaches a fundamental lesson: setting even basic cost controls is important.  

A hard company-wide cap on AI usage is a good first step, but teams and decision-makers should explore more detailed cost controls.  

Lever What it does In Amplify AI Gateway 
Consumption quotas Caps total token spend per team or user Yes 
Quota granularity Hourly to monthly limits per team or employee Yes 
Per-model caps Limits use of the most expensive models Yes 
Model fallback Drops to a cheaper model when a cap is hit Yes 
Monitoring + rate limiting Visibility and alerts on token usage Yes 
Semantic caching Reuses answers to repeated prompts On the roadmap 
Semantic routing Sends each task to the cheapest capable model On the roadmap 

Adopting dedicated AI governance software 

User interfaces like ChatGPT or Claude allow setting basic limits on spending. But adopting a dedicated software solution for cost controls will take your governance a large step forward 

Amplify AI Gateway is designed for extensive and enterprise-ready AI governance. It gives you control over many aspects of AI use, like AI infrastructure orchestration, business enablement, and extensive functionality for consumption and cost control.  

You can’t control what you can’t see 

The first good step is to establish the baseline: how much does your organization spend in total, which teams spend the least and the most, and what tasks consume the most tokens. A lot of organizations quickly discover that their AI spending is concentrated in specific teams, workflows, and models. I covered the reasons above — some teams will naturally spend more on high-volume/high ROI tasks.  

You can establish the baseline with the extensive monitoring capabilities of an AI gateway. A good solution, like Amplify AI Gateway, gives a detailed look into consumption.  

AI consumption quotas 

The most basic quota is the company-wide token consumption limit, but that is not enough. As software development has become the most token-consuming line of AI spending, you may want to set up different quotas for different lines of your business: some teams need only a few tokens to enable their AI usage, and some teams need a lot more to produce products.  

Quota granularity 

If you used a free-tier or Pro-plan AI interface like ChatGPT or Claude, you likely noticed a mix of short-term and long-term quotas. For instance, Claude and ChatGPT have a 5-hour usage limit and a weekly usage limit. Some tools have quotas that are separate from general chatbot functionality.  

Controls like that and more are available in an AI gateway. If your organization has adopted AI across teams, you can set up quotas for teams and time periods with granularity as short as one hour. You can enable hourly, daily, weekly, and monthly usage limits for specific teams or even employees.  

Establishing such granularity is not so much about limiting as it is about budgeting and long-term enabling: you won’t run a risk of wasting large amounts of tokens within a short amount of time, but instead, you will have predictable, consistent, and intentional consumption. 

Per-model controls 

Not all models are created equal. Major AI producers have several types of models; for instance, Anthropic currently provides lightweight and cheap Haiku, heavyweight and expensive Opus, and balanced Sonnet. Not all tasks require the heaviest and most expensive models, and in fact, outputs of some tasks may suffer from using a heavier model. 

Enabling per-model controls is a task that can be accomplished in Amplify AI Gateway. Teams and individual users can have an overall hard consumption cap and per-model caps with limits on different types of models. When a limit on the most expensive model is hit, a gateway can be set up to fall back to a less expensive model until the limit resets.  

With a mix of multi-level and granular temporal, per-user, per-team, and per-model limits, you gain delicate control over token consumption without limits on productivity. The use of AI becomes less chaotic, more organized, and intentional. Cost controls become less about restricting and more about accountability, planning, and discipline.  

Usage optimization with semantic caching and routing 

Not everything is about limiting — optimization plays a very large role in taking control of your spending. I presented one optimization step, which is a fallback to less expensive model after usage limit is hit.  

Another option that we are exploring for Amplify AI Gateway is semantic caching. This feature will be helpful for most frequently asked questions and prompts inside your AI-enabled ecosystem. If there is a prompt or chatbot question that is run frequently and produces the same reply several times, each request and reply consumes tokens as if it is run for the first time. Semantic caching will prevent token consumption for frequently run prompts or frequently asked questions by temporarily storing the first generated output and giving it to a user instead of spending tokens to answer the same question repeatedly.  

Another feature for spending optimization is semantic routing. I mentioned that not all tasks require the most advanced and expensive model, and some tasks benefit from using a simpler and cheaper model. Semantic routing in an AI gateway saves costs by choosing which type of model is best suited for the task and prioritizing cheaper models for tasks that don’t require advanced and expensive models.  

Axway helps save AI costs 

Cost control is one of the most essential use cases for Amplify AI Gateway. We created this product for governance over the enterprise-level use of AI. Integrated on top of Amplify Fusion and Engage, Amplify AI Gateway is a middleware application between users and AI providers.  

You are moving from AI experimentation to AI as an everyday tool. It is only natural that costs rise quickly as AI advances further and further every month. With Amplify AI Gateway and Amplify Fusion, your use of AI transforms from uncontrollable and costly experimentation to governed, scalable, and even AI adoption across the organization.  

Embrace AI innovation without losing control with Amplify AI Gateway and Amplify Fusion.  

Frequently Asked Questions

What is an AI gateway?

An AI gateway is middleware that sits between your users and AI providers, giving you one place to control cost, usage, and governance across every model. 

How do enterprises control AI costs?

Enterprises control AI costs with quotas, per-model caps, model fallback, caching, and usage monitoring, applied per team and per user through an AI gateway. 

What is semantic caching?

Semantic caching stores the answer to a frequently asked prompt and reuses it, so repeated questions don’t spend tokens generating the same reply twice. 

How does an AI gateway reduce LLM costs?

An AI gateway reduces LLM costs by routing simple tasks to cheaper models, capping spend per team, and caching repeated prompts, without slowing teams down. 

Learn more about Amplify AI Gateway