Key Takeaways

  • Establishing access limitations is essential to an API strategy.
  • API rate limiting/throttling, API burst, and API quota are all measures that help support clients while protecting backend services.
  • An API marketplace backed by a universal API management platform supports these measures to protect API performance.


As an API provider, it can be hard to know the exact usage of your services. Yet, part of managing these digital assets means limiting access to them. That way, providers can avoid API misuse and ensure all users have fair access to available APIs.

When talking about API access limitation, there are some key terms to define. Specifically, what are API rate limiting/throttling, API quota, and API bursts? Read on for definitions and examples, and learn how enterprises can better monitor, manage, and govern all APIs from a single pane of glass.

What is API rate limiting/throttling?

API rate limiting (also called throttling) caps the number of API requests a client can make within a defined time window, returning an HTTP 429 Too Many Requests response when the cap is hit. It is the single most important traffic-control policy on any public-facing API because it protects the backend from being overwhelmed by accidental bursts, runaway scripts, or deliberate denial-of-service attacks, all without needing the application code to defend itself.

The number of API calls your backend can process per time unit is typically measured by TPS, or transaction per second. In some cases, systems also have a physical limit of data transferred in Bytes.

Let’s say your backend can process 2,000 TPS, what’s known as backend rate limiting. With API rate limiting or API throttling, you can cap the number of requests an API gateway can process in a given period. Doing so protects backend services from being flooded with excessive messages.

Dive deeper into API rate limits here and FAQs here.

You can configure a rate limit for specified clients that limits the number of messages they can send. This configuration is referred to as application rate limiting.

If a client exceeds their allotted number of requests, their connection is throttled. Processing slows down, but the connection remains open to reduce errors.

It’s important to note there is the risk of connections timing out. The risk of maintaining longer connections could also open a vector to denial-of-service (DoS) attacks.

What is API burst?

API burst is the maximum short-term spike of requests that the rate limit allows above the steady-state rate, before throttling kicks in. A typical configuration is something like 100 requests per second sustained with a burst of 200, meaning the client can briefly send up to 200 in a single second but cannot sustain that pace. Burst capacity smooths out normal usage patterns (page loads firing several requests at once) while still preventing prolonged abuse.

When your system has the capacity or is idle, you may want to let a single client send more requests than the defined limit. Within this API peak, clients cannot always control the number of API calls emitted.

An API burst temporarily accommodates this higher volume of requests while avoiding the potential for overload. Based on the defined burst size, you can control the number of excess requests a client can make at the specified rate limit at the millisecond level.

If you have a configured rate limit of 500 TPS, that’s one request per 2 milliseconds (the burst zone). If your burst size is 0, and 2 requests are made in that 2-millisecond zone, one request will be processed and the other rejected.

The key to API burst is balancing client demand with rate-limiting measures. That way, you can support surges in traffic without hindering API performance.

What is API quota?

Rate limiting algorithms at a glance

The four main algorithms for enforcing rate limits behave differently under load. Picking the right one is the difference between a smooth experience for legitimate users and one that punishes them at the edges of every time window.

AlgorithmHow it worksBest forTrade-off
Fixed windowReset the counter at fixed intervals (every minute or hour)Simple to implement, easy to reason aboutBurstiness at window boundaries: a client can fire 2x the limit by spanning a boundary
Sliding windowRolling time window, count of requests in the last N secondsSmooths out window-boundary bursts, fairer enforcementMore expensive to compute (needs per-request timestamps)
Token bucketTokens refill at a steady rate; each request consumes oneAllows controlled bursts up to bucket size; standard for most cloud APIsTwo parameters to tune (refill rate + bucket size), slightly harder to explain
Leaky bucketRequests queue and drain at a fixed rate; overflow is droppedForces a strictly even outbound rate (good for downstream systems with hard caps)Cannot absorb bursts; rejects legitimate spikes that the token bucket would allow

Most modern API gateways default to token bucket because it gives the best balance of burst tolerance and protection. The actual algorithm matters more than its name: what you really tune is how many requests over what window, and how to communicate the limit to clients (Retry-After and X-RateLimit-* headers).

An API quota is the total number of API calls a specific consumer is allowed over a longer time window (usually per day or per month), independent of how fast they are made. Where rate limiting is about traffic shape, quotas are about consumption budgeting: an API product on the basic tier might allow 10,000 calls per month, the pro tier 100,000, and the enterprise tier unlimited. Quotas turn raw API capacity into a packageable, monetizable product.

If you’re looking at more of the commercial side and long-term consumption of API calls and data, API quotas can be a useful tool. API quotas usually describe a certain number of allotted calls for longer intervals.

For example, you might set your API quota at 5,000 calls per month. (Remember, you can combine this quota with a rate limit, such as 20 TPS.)

The quota time window is activated when that first API call is made. Once the time window lapses, the counter resets to zero. It remains zero until the next API call is made.

To enforce an API quota, you need to identify the client or consumer. That’s why we use the term user quota. Through an API marketplace that supports full lifecycle API management, consumers can easily select the subscription plan that suits their quota needs.

They can also access documentation that helps them better understand the API’s value and how to test and use it. SLAs are often also attached to define service response times and availability.

Looking at API quota in more detail, you can imagine setting limits not only based on a client/consumer but also on a per-consumption application level. This is known as an application quota.

You can also limit API calls that consume more backend computing power and impact service.

Gain visibility and control over APIs with Amplify Platform

How clients should handle rate limit responses

A rate-limited client gets back HTTP 429 Too Many Requests, often with a Retry-After header that tells it how many seconds to wait before retrying. A well-behaved client respects this header and does not retry immediately; a misbehaving client retries in a tight loop and digs itself deeper into the throttle.

Three rules for clients calling rate-limited APIs:

  • Always check for X-RateLimit-Remaining headers on every response. Most APIs send back current remaining budget so the client can throttle itself before hitting the limit.
  • Honor Retry-After. If the API tells you to wait 30 seconds, wait at least 30 seconds. Retrying earlier wastes capacity for everyone and often resets the throttle window.
  • Use exponential backoff with jitter for any 429 that does not include Retry-After. Wait 1 second, then 2, then 4, then 8, with random jitter so multiple throttled clients do not all retry at the same instant.

A federated API management platform like Amplify Fusion enforces the same rate-limiting and burst policies consistently across every gateway in your estate, and surfaces rate-limit telemetry per API per consumer so platform teams can spot abuse patterns before they become incidents.

This level of visibility and control over APIs is best achieved through an API platform that provides federated API management functionality.

Modern enterprises deal with significant complexity, as business units often develop APIs independently of each other. This can lead to silos that fragment CX as well as time-consuming management, automation and standardization headaches, and significant resource duplication.

A universal API management platform like Axway’s Amplify helps you securely manage the full API lifecycle and simplify API discovery and use.

  • Operational tooling lets you monitor and support higher levels of service.
  • Thanks to a policy-based security gateway, teams can define policy, accessibility, rate limits, and quotas with over 200 prebuilt security policies.

And when it’s time to bring your digital products to market, the API Marketplace component allows you to track adoption, usage, and performance metrics for all your API products, offering key insights so you can make better decisions about future investments.

 

Further modernize your API strategy by curating and packaging your APIs for business value. Tune into this API Talks webinar.