Rate‑Limiting Strategies: token‑bucket vs leaky‑bucket vs fixed‑window

What Is Rate Limiting?

Rate limiting controls how many requests users can make within a specific time. It helps APIs stay stable, prevents abuse, and ensures fair usage among clients. Without it, one bad actor could easily overload your system.

Good rate-limiting strategies balance performance, fairness, and simplicity.

Why Rate Limiting Matters

When too many requests hit your backend, performance drops quickly. A proper rate-limit policy helps you:

  • Prevent overloads and crashes.
  • Avoid denial-of-service attacks.
  • Keep costs predictable in cloud environments.
  • Maintain a smooth experience for all users.

To achieve these goals, developers often rely on a few well-known algorithms. Let’s explore them.

Fixed-Window Algorithm

The fixed-window algorithm counts requests in a strict time period (for example, 100 per minute).
When the count exceeds the limit, new requests are rejected until the next minute begins.

Advantages:

  • Very easy to implement.
  • Predictable request limits.

Disadvantages:

  • Can cause request bursts right after the window resets.

Because of that burst risk, many teams switch to an improved method — the sliding window.

Sliding Window: A Smoother Alternative

The sliding window tracks requests in a moving time frame, such as “the last 60 seconds.”
It gives a more accurate picture of user activity and avoids sudden spikes.

Advantages:

  • Reduces burst traffic.
  • Offers more precise rate control.

Disadvantages:

  • Slightly more complex to calculate and store data.

As a result, sliding windows work better for APIs that receive steady but unpredictable traffic.

Token-Bucket Algorithm

The token-bucket approach uses tokens that fill the “bucket” at a steady rate.
Each request consumes one token. If the bucket runs empty, the request must wait or gets denied.

Advantages:

  • Allows short bursts while keeping the average rate stable.
  • Provides flexibility for APIs with irregular load.

Disadvantages:

  • Needs careful tuning to avoid long delays.

This strategy works great for public APIs or mobile apps where user behavior is unpredictable.

Leaky-Bucket Algorithm

The leaky-bucket algorithm also uses a bucket — but it leaks requests at a fixed rate.
New requests enter a queue and are processed one by one. When the queue is full, extra requests are dropped.

Advantages:

  • Keeps a consistent request flow.
  • Smooths out heavy bursts.

Disadvantages:

  • Less flexible for sudden spikes.

That’s why payment systems and event-driven platforms often rely on leaky-bucket behavior for safety.

Choosing the Right Strategy

Here’s a quick guide to help you decide:

Use CaseRecommended Strategy
Internal APIsFixed-Window
Public APIsToken-Bucket
Payment or Message QueuesLeaky-Bucket
Real-Time AppsSliding-Window

Always test different configurations under real traffic to find your balance between fairness and speed.

Final Thoughts

Rate limiting is not a one-size-fits-all solution. However, the right algorithm can protect your backend from overloads and improve reliability. Many modern API gateways — such as Kong, NGINX, and Envoy — already include built-in support for these techniques.

To see how rate limiting fits into a larger architecture, check out API Gateway Patterns for SaaS Applications.
For more technical details, explore the official NGINX Blog on Rate Limiting.

Leave a Comment

Scroll to Top