API Rate Limiting 101: Protect Your Backend from Abuse

In today’s API-driven world, protecting your backend isn’t just about authentication. It’s also about controlling how often clients can hit your endpoints. Without proper rate limiting, your backend is vulnerable to DDoS attacks, resource exhaustion, and abuse from overzealous users or bots.

This comprehensive guide covers everything you need to know about API rate limiting, including algorithm implementations, distributed systems considerations, and production-ready code examples.

What Is API Rate Limiting?

API rate limiting is a technique used to control the number of requests a client can make to your server over a specific time period. The goal is to:

Prevent server overload – Protect infrastructure from being overwhelmed
Deter abusive behavior – Stop scrapers, bots, and malicious actors
Ensure fair usage – Give all users equal access to resources
Control costs – Limit expensive operations and third-party API calls
Protect paid features – Enforce usage tiers for different subscription plans

Why You Need It

Without rate limiting, your API faces several risks:

A single bad actor can flood your server with thousands of requests
Bots can brute force authentication endpoints
Your infrastructure costs can skyrocket with overuse
Other users experience degraded performance or outages
Database connections get exhausted, causing cascading failures

Even trusted users may unintentionally cause harm without limits in place, such as a buggy client making infinite retry loops.

Rate Limiting Strategies Explained

Each strategy has different trade-offs for accuracy, memory usage, and burst handling.

1. Fixed Window

The simplest approach: allow X requests per time window (e.g., 1000 requests/hour). When the window resets, the counter resets.

// Node.js - Fixed Window Rate Limiter
class FixedWindowRateLimiter {
  constructor(windowMs, maxRequests) {
    this.windowMs = windowMs;
    this.maxRequests = maxRequests;
    this.windows = new Map();
  }

  isAllowed(clientId) {
    const now = Date.now();
    const windowStart = Math.floor(now / this.windowMs) * this.windowMs;
    const key = `${clientId}:${windowStart}`;

    // Clean old windows
    for (const [k, v] of this.windows) {
      if (!k.endsWith(`:${windowStart}`)) {
        this.windows.delete(k);
      }
    }

    const current = this.windows.get(key) || 0;

    if (current >= this.maxRequests) {
      return {
        allowed: false,
        remaining: 0,
        resetAt: windowStart + this.windowMs,
      };
    }

    this.windows.set(key, current + 1);

    return {
      allowed: true,
      remaining: this.maxRequests - current - 1,
      resetAt: windowStart + this.windowMs,
    };
  }
}

// Usage
const limiter = new FixedWindowRateLimiter(60000, 100); // 100 req/min
const result = limiter.isAllowed('user-123');
if (!result.allowed) {
  res.status(429).json({
    error: 'Too many requests',
    retryAfter: Math.ceil((result.resetAt - Date.now()) / 1000),
  });
}

Pros: Simple, memory efficient
Cons: Burst problem at window edges (user can make 2X requests at the boundary)

2. Sliding Window Log

Stores timestamps of each request and counts requests within the rolling window. Most accurate but memory-intensive.

// Node.js - Sliding Window Log Rate Limiter
class SlidingWindowLogLimiter {
  constructor(windowMs, maxRequests) {
    this.windowMs = windowMs;
    this.maxRequests = maxRequests;
    this.logs = new Map(); // clientId -> timestamp array
  }

  isAllowed(clientId) {
    const now = Date.now();
    const windowStart = now - this.windowMs;

    // Get or create log for this client
    let timestamps = this.logs.get(clientId) || [];

    // Remove timestamps outside the window
    timestamps = timestamps.filter((ts) => ts > windowStart);

    if (timestamps.length >= this.maxRequests) {
      // Find when the oldest request in window will expire
      const oldestInWindow = Math.min(...timestamps);
      const retryAfter = oldestInWindow + this.windowMs - now;

      this.logs.set(clientId, timestamps);

      return {
        allowed: false,
        remaining: 0,
        retryAfter: Math.ceil(retryAfter / 1000),
      };
    }

    // Add current timestamp
    timestamps.push(now);
    this.logs.set(clientId, timestamps);

    return {
      allowed: true,
      remaining: this.maxRequests - timestamps.length,
      retryAfter: 0,
    };
  }
}

Pros: Most accurate, no boundary issues
Cons: High memory usage for high-traffic APIs

3. Sliding Window Counter (Hybrid)

Combines fixed window efficiency with sliding window accuracy by weighting the previous window’s count.

// Node.js - Sliding Window Counter Rate Limiter
class SlidingWindowCounterLimiter {
  constructor(windowMs, maxRequests) {
    this.windowMs = windowMs;
    this.maxRequests = maxRequests;
    this.windows = new Map(); // clientId -> { prevCount, currCount, currStart }
  }

  isAllowed(clientId) {
    const now = Date.now();
    const currWindowStart = Math.floor(now / this.windowMs) * this.windowMs;

    let data = this.windows.get(clientId);

    if (!data || data.currStart !== currWindowStart) {
      // New window
      const prevCount = data && data.currStart === currWindowStart - this.windowMs
        ? data.currCount
        : 0;

      data = {
        prevCount,
        currCount: 0,
        currStart: currWindowStart,
      };
    }

    // Calculate weighted count
    const elapsedInWindow = now - currWindowStart;
    const prevWindowWeight = 1 - elapsedInWindow / this.windowMs;
    const weightedCount = data.prevCount * prevWindowWeight + data.currCount;

    if (weightedCount >= this.maxRequests) {
      this.windows.set(clientId, data);
      return {
        allowed: false,
        remaining: 0,
        retryAfter: Math.ceil((this.windowMs - elapsedInWindow) / 1000),
      };
    }

    data.currCount++;
    this.windows.set(clientId, data);

    return {
      allowed: true,
      remaining: Math.floor(this.maxRequests - weightedCount - 1),
      retryAfter: 0,
    };
  }
}

Pros: Good balance of accuracy and memory
Cons: Slightly more complex than fixed window

4. Token Bucket

Tokens refill at a steady rate; each request consumes a token. Allows bursty traffic while enforcing long-term limits.

// Node.js - Token Bucket Rate Limiter
class TokenBucketLimiter {
  constructor(bucketSize, refillRate) {
    this.bucketSize = bucketSize;     // Max tokens
    this.refillRate = refillRate;     // Tokens per second
    this.buckets = new Map();         // clientId -> { tokens, lastRefill }
  }

  isAllowed(clientId, tokensRequired = 1) {
    const now = Date.now();
    let bucket = this.buckets.get(clientId);

    if (!bucket) {
      bucket = {
        tokens: this.bucketSize,
        lastRefill: now,
      };
    }

    // Calculate tokens to add based on time elapsed
    const elapsed = (now - bucket.lastRefill) / 1000;
    const tokensToAdd = elapsed * this.refillRate;
    bucket.tokens = Math.min(this.bucketSize, bucket.tokens + tokensToAdd);
    bucket.lastRefill = now;

    if (bucket.tokens < tokensRequired) {
      // Calculate when enough tokens will be available
      const tokensNeeded = tokensRequired - bucket.tokens;
      const waitTime = tokensNeeded / this.refillRate;

      this.buckets.set(clientId, bucket);

      return {
        allowed: false,
        remaining: Math.floor(bucket.tokens),
        retryAfter: Math.ceil(waitTime),
      };
    }

    bucket.tokens -= tokensRequired;
    this.buckets.set(clientId, bucket);

    return {
      allowed: true,
      remaining: Math.floor(bucket.tokens),
      retryAfter: 0,
    };
  }
}

// Usage: Allow bursts of 50 requests, refill at 10/second
const limiter = new TokenBucketLimiter(50, 10);

Pros: Handles bursts gracefully, intuitive model
Cons: Requires tuning bucket size and refill rate

Production Express.js Rate Limiting Middleware

// middleware/rateLimiter.js
const rateLimit = require('express-rate-limit');
const RedisStore = require('rate-limit-redis');
const Redis = require('ioredis');

const redis = new Redis(process.env.REDIS_URL);

// Standard rate limiter for most endpoints
const standardLimiter = rateLimit({
  store: new RedisStore({
    sendCommand: (...args) => redis.call(...args),
  }),
  windowMs: 15 * 60 * 1000, // 15 minutes
  max: 100,
  message: {
    error: 'Too many requests',
    message: 'Please try again later',
  },
  standardHeaders: true, // Return rate limit info in headers
  legacyHeaders: false,
  keyGenerator: (req) => {
    // Use user ID if authenticated, otherwise IP
    return req.user?.id || req.ip;
  },
  skip: (req) => {
    // Skip rate limiting for health checks
    return req.path === '/health';
  },
});

// Strict limiter for authentication endpoints
const authLimiter = rateLimit({
  store: new RedisStore({
    sendCommand: (...args) => redis.call(...args),
  }),
  windowMs: 60 * 60 * 1000, // 1 hour
  max: 5, // 5 attempts per hour
  message: {
    error: 'Too many login attempts',
    message: 'Account temporarily locked. Try again later.',
  },
  standardHeaders: true,
  legacyHeaders: false,
  keyGenerator: (req) => {
    // Rate limit by IP + email combination
    const email = req.body?.email || 'unknown';
    return `${req.ip}:${email}`;
  },
});

// Expensive operations limiter (e.g., exports, reports)
const heavyLimiter = rateLimit({
  store: new RedisStore({
    sendCommand: (...args) => redis.call(...args),
  }),
  windowMs: 60 * 60 * 1000, // 1 hour
  max: 10,
  message: {
    error: 'Export limit reached',
    message: 'You can only export 10 times per hour',
  },
  standardHeaders: true,
  legacyHeaders: false,
});

module.exports = { standardLimiter, authLimiter, heavyLimiter };

// Usage in Express app
const express = require('express');
const { standardLimiter, authLimiter, heavyLimiter } = require('./middleware/rateLimiter');

const app = express();

// Apply standard limiter to all API routes
app.use('/api/', standardLimiter);

// Apply strict limiter to auth routes
app.use('/api/auth/login', authLimiter);
app.use('/api/auth/register', authLimiter);
app.use('/api/auth/forgot-password', authLimiter);

// Apply heavy limiter to expensive operations
app.use('/api/reports/export', heavyLimiter);
app.use('/api/data/bulk-download', heavyLimiter);

Distributed Rate Limiting with Redis

For multi-server deployments, use Redis for shared state:

// Atomic sliding window counter with Redis Lua script
const Redis = require('ioredis');
const redis = new Redis(process.env.REDIS_URL);

const slidingWindowScript = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])

local clearBefore = now - window
redis.call('ZREMRANGEBYSCORE', key, '-inf', clearBefore)

local count = redis.call('ZCARD', key)
if count < limit then
    redis.call('ZADD', key, now, now .. '-' .. math.random())
    redis.call('EXPIRE', key, math.ceil(window / 1000))
    return {1, limit - count - 1}
else
    return {0, 0}
end
`;

class RedisRateLimiter {
  constructor(redis, windowMs, maxRequests, keyPrefix = 'ratelimit') {
    this.redis = redis;
    this.windowMs = windowMs;
    this.maxRequests = maxRequests;
    this.keyPrefix = keyPrefix;
  }

  async isAllowed(clientId) {
    const key = `${this.keyPrefix}:${clientId}`;
    const now = Date.now();

    const result = await this.redis.eval(
      slidingWindowScript,
      1,
      key,
      now,
      this.windowMs,
      this.maxRequests
    );

    return {
      allowed: result[0] === 1,
      remaining: result[1],
    };
  }
}

// Usage
const limiter = new RedisRateLimiter(redis, 60000, 100);

app.use(async (req, res, next) => {
  const clientId = req.user?.id || req.ip;
  const result = await limiter.isAllowed(clientId);

  res.set('X-RateLimit-Limit', '100');
  res.set('X-RateLimit-Remaining', result.remaining.toString());

  if (!result.allowed) {
    return res.status(429).json({
      error: 'Too many requests',
      message: 'Please slow down',
    });
  }

  next();
});

Spring Boot Rate Limiting with Bucket4j

// pom.xml dependency
<dependency>
    <groupId>com.bucket4j</groupId>
    <artifactId>bucket4j-core</artifactId>
    <version>8.7.0</version>
</dependency>

// RateLimitingFilter.java
import io.github.bucket4j.*;
import jakarta.servlet.*;
import jakarta.servlet.http.*;
import org.springframework.stereotype.Component;

import java.io.IOException;
import java.time.Duration;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

@Component
public class RateLimitingFilter implements Filter {

    private final Map<String, Bucket> buckets = new ConcurrentHashMap<>();

    private Bucket createNewBucket() {
        return Bucket.builder()
            .addLimit(Bandwidth.builder()
                .capacity(100)
                .refillGreedy(100, Duration.ofMinutes(1))
                .build())
            .addLimit(Bandwidth.builder()
                .capacity(10)
                .refillGreedy(10, Duration.ofSeconds(1))
                .build())
            .build();
    }

    @Override
    public void doFilter(
            ServletRequest request,
            ServletResponse response,
            FilterChain chain) throws IOException, ServletException {

        HttpServletRequest httpRequest = (HttpServletRequest) request;
        HttpServletResponse httpResponse = (HttpServletResponse) response;

        String clientId = getClientId(httpRequest);
        Bucket bucket = buckets.computeIfAbsent(clientId, k -> createNewBucket());

        ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1);

        httpResponse.setHeader("X-RateLimit-Remaining",
            String.valueOf(probe.getRemainingTokens()));

        if (probe.isConsumed()) {
            chain.doFilter(request, response);
        } else {
            long waitSeconds = probe.getNanosToWaitForRefill() / 1_000_000_000;
            httpResponse.setHeader("Retry-After", String.valueOf(waitSeconds));
            httpResponse.setStatus(429);
            httpResponse.getWriter().write(
                "{\"error\": \"Too many requests\", \"retryAfter\": " + waitSeconds + "}");
        }
    }

    private String getClientId(HttpServletRequest request) {
        // Try to get user ID from JWT token
        String authHeader = request.getHeader("Authorization");
        if (authHeader != null && authHeader.startsWith("Bearer ")) {
            // Extract user ID from token (implement your JWT parsing)
            return extractUserIdFromToken(authHeader.substring(7));
        }
        // Fall back to IP address
        String forwarded = request.getHeader("X-Forwarded-For");
        return forwarded != null ? forwarded.split(",")[0] : request.getRemoteAddr();
    }

    private String extractUserIdFromToken(String token) {
        // Implement JWT parsing
        return token.hashCode() + ""; // Placeholder
    }
}

Rate Limit Response Headers

Always include these headers to help clients manage their requests:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1704067200
Retry-After: 30

{
  "error": "rate_limit_exceeded",
  "message": "You have exceeded the rate limit. Please try again later.",
  "retryAfter": 30
}

Tiered Rate Limits by Subscription

// Define limits per subscription tier
const tierLimits = {
  free: {
    requestsPerMinute: 20,
    requestsPerDay: 1000,
    burstSize: 5,
  },
  pro: {
    requestsPerMinute: 100,
    requestsPerDay: 10000,
    burstSize: 20,
  },
  enterprise: {
    requestsPerMinute: 1000,
    requestsPerDay: 100000,
    burstSize: 100,
  },
};

// Middleware to apply tier-based limits
const tieredRateLimiter = async (req, res, next) => {
  const user = req.user;
  const tier = user?.subscription?.tier || 'free';
  const limits = tierLimits[tier];

  const clientId = user?.id || req.ip;

  // Check minute limit
  const minuteResult = await checkLimit(
    `${clientId}:minute`,
    60000,
    limits.requestsPerMinute
  );

  if (!minuteResult.allowed) {
    return res.status(429).json({
      error: 'Rate limit exceeded',
      limit: limits.requestsPerMinute,
      remaining: 0,
      resetIn: minuteResult.resetIn,
      upgradeUrl: tier === 'free' ? '/pricing' : null,
    });
  }

  // Check daily limit
  const dayResult = await checkLimit(
    `${clientId}:day`,
    86400000,
    limits.requestsPerDay
  );

  if (!dayResult.allowed) {
    return res.status(429).json({
      error: 'Daily limit exceeded',
      limit: limits.requestsPerDay,
      remaining: 0,
      resetIn: dayResult.resetIn,
      upgradeUrl: '/pricing',
    });
  }

  res.set('X-RateLimit-Limit-Minute', limits.requestsPerMinute.toString());
  res.set('X-RateLimit-Remaining-Minute', minuteResult.remaining.toString());
  res.set('X-RateLimit-Limit-Day', limits.requestsPerDay.toString());
  res.set('X-RateLimit-Remaining-Day', dayResult.remaining.toString());

  next();
};

Common Mistakes to Avoid

1. Rate limiting only by IP address

// Wrong: All users behind a NAT share the same limit
const clientId = req.ip;

// Correct: Use authenticated user ID when available
const clientId = req.user?.id || req.ip;

2. Not handling the rate limit response on the client

// Wrong: Ignoring 429 responses
const response = await fetch('/api/data');
const data = await response.json();

// Correct: Handle rate limiting gracefully
const response = await fetch('/api/data');
if (response.status === 429) {
  const retryAfter = response.headers.get('Retry-After');
  await sleep(parseInt(retryAfter) * 1000);
  return fetch('/api/data'); // Retry
}

3. Not providing Retry-After headers

// Wrong: Client doesn't know when to retry
res.status(429).json({ error: 'Too many requests' });

// Correct: Include retry information
res.set('Retry-After', '30');
res.status(429).json({
  error: 'Too many requests',
  retryAfter: 30,
});

4. Applying the same limits to all endpoints

// Wrong: Login and search have the same limit
app.use('/api/', rateLimiter);

// Correct: Different limits for different risk levels
app.use('/api/auth/login', strictLimiter);  // 5/hour
app.use('/api/search', standardLimiter);     // 100/min
app.use('/api/reports/export', heavyLimiter); // 10/hour

Strategy Comparison

Strategy	Burst Tolerant	Accuracy	Memory	Complexity
Fixed Window	No	Low	O(1)	Low
Sliding Log	No	High	O(n)	Medium
Sliding Counter	Partial	Medium	O(1)	Medium
Token Bucket	Yes	High	O(1)	Medium
Leaky Bucket	Yes	High	O(n)	High

Final Thoughts

API rate limiting is a non-negotiable in modern backend architecture. It's relatively simple to implement but can save your backend from downtime, abuse, and unpredictable costs. Start with a simple fixed window or token bucket approach, then evolve to more sophisticated strategies as your needs grow.

Remember to always provide clear feedback to clients about their rate limit status, implement different limits for different endpoint types, and use Redis or a similar distributed store when running multiple server instances.