
In today’s API-driven world, protecting your backend isn’t just about authentication. It’s also about controlling how often clients can hit your endpoints. Without proper rate limiting, your backend is vulnerable to DDoS attacks, resource exhaustion, and abuse from overzealous users or bots.
This comprehensive guide covers everything you need to know about API rate limiting, including algorithm implementations, distributed systems considerations, and production-ready code examples.
What Is API Rate Limiting?
API rate limiting is a technique used to control the number of requests a client can make to your server over a specific time period. The goal is to:
- Prevent server overload – Protect infrastructure from being overwhelmed
- Deter abusive behavior – Stop scrapers, bots, and malicious actors
- Ensure fair usage – Give all users equal access to resources
- Control costs – Limit expensive operations and third-party API calls
- Protect paid features – Enforce usage tiers for different subscription plans
Why You Need It
Without rate limiting, your API faces several risks:
- A single bad actor can flood your server with thousands of requests
- Bots can brute force authentication endpoints
- Your infrastructure costs can skyrocket with overuse
- Other users experience degraded performance or outages
- Database connections get exhausted, causing cascading failures
Even trusted users may unintentionally cause harm without limits in place, such as a buggy client making infinite retry loops.
Rate Limiting Strategies Explained
Each strategy has different trade-offs for accuracy, memory usage, and burst handling.
1. Fixed Window
The simplest approach: allow X requests per time window (e.g., 1000 requests/hour). When the window resets, the counter resets.
// Node.js - Fixed Window Rate Limiter
class FixedWindowRateLimiter {
constructor(windowMs, maxRequests) {
this.windowMs = windowMs;
this.maxRequests = maxRequests;
this.windows = new Map();
}
isAllowed(clientId) {
const now = Date.now();
const windowStart = Math.floor(now / this.windowMs) * this.windowMs;
const key = `${clientId}:${windowStart}`;
// Clean old windows
for (const [k, v] of this.windows) {
if (!k.endsWith(`:${windowStart}`)) {
this.windows.delete(k);
}
}
const current = this.windows.get(key) || 0;
if (current >= this.maxRequests) {
return {
allowed: false,
remaining: 0,
resetAt: windowStart + this.windowMs,
};
}
this.windows.set(key, current + 1);
return {
allowed: true,
remaining: this.maxRequests - current - 1,
resetAt: windowStart + this.windowMs,
};
}
}
// Usage
const limiter = new FixedWindowRateLimiter(60000, 100); // 100 req/min
const result = limiter.isAllowed('user-123');
if (!result.allowed) {
res.status(429).json({
error: 'Too many requests',
retryAfter: Math.ceil((result.resetAt - Date.now()) / 1000),
});
}
Pros: Simple, memory efficient
Cons: Burst problem at window edges (user can make 2X requests at the boundary)
2. Sliding Window Log
Stores timestamps of each request and counts requests within the rolling window. Most accurate but memory-intensive.
// Node.js - Sliding Window Log Rate Limiter
class SlidingWindowLogLimiter {
constructor(windowMs, maxRequests) {
this.windowMs = windowMs;
this.maxRequests = maxRequests;
this.logs = new Map(); // clientId -> timestamp array
}
isAllowed(clientId) {
const now = Date.now();
const windowStart = now - this.windowMs;
// Get or create log for this client
let timestamps = this.logs.get(clientId) || [];
// Remove timestamps outside the window
timestamps = timestamps.filter((ts) => ts > windowStart);
if (timestamps.length >= this.maxRequests) {
// Find when the oldest request in window will expire
const oldestInWindow = Math.min(...timestamps);
const retryAfter = oldestInWindow + this.windowMs - now;
this.logs.set(clientId, timestamps);
return {
allowed: false,
remaining: 0,
retryAfter: Math.ceil(retryAfter / 1000),
};
}
// Add current timestamp
timestamps.push(now);
this.logs.set(clientId, timestamps);
return {
allowed: true,
remaining: this.maxRequests - timestamps.length,
retryAfter: 0,
};
}
}
Pros: Most accurate, no boundary issues
Cons: High memory usage for high-traffic APIs
3. Sliding Window Counter (Hybrid)
Combines fixed window efficiency with sliding window accuracy by weighting the previous window’s count.
// Node.js - Sliding Window Counter Rate Limiter
class SlidingWindowCounterLimiter {
constructor(windowMs, maxRequests) {
this.windowMs = windowMs;
this.maxRequests = maxRequests;
this.windows = new Map(); // clientId -> { prevCount, currCount, currStart }
}
isAllowed(clientId) {
const now = Date.now();
const currWindowStart = Math.floor(now / this.windowMs) * this.windowMs;
let data = this.windows.get(clientId);
if (!data || data.currStart !== currWindowStart) {
// New window
const prevCount = data && data.currStart === currWindowStart - this.windowMs
? data.currCount
: 0;
data = {
prevCount,
currCount: 0,
currStart: currWindowStart,
};
}
// Calculate weighted count
const elapsedInWindow = now - currWindowStart;
const prevWindowWeight = 1 - elapsedInWindow / this.windowMs;
const weightedCount = data.prevCount * prevWindowWeight + data.currCount;
if (weightedCount >= this.maxRequests) {
this.windows.set(clientId, data);
return {
allowed: false,
remaining: 0,
retryAfter: Math.ceil((this.windowMs - elapsedInWindow) / 1000),
};
}
data.currCount++;
this.windows.set(clientId, data);
return {
allowed: true,
remaining: Math.floor(this.maxRequests - weightedCount - 1),
retryAfter: 0,
};
}
}
Pros: Good balance of accuracy and memory
Cons: Slightly more complex than fixed window
4. Token Bucket
Tokens refill at a steady rate; each request consumes a token. Allows bursty traffic while enforcing long-term limits.
// Node.js - Token Bucket Rate Limiter
class TokenBucketLimiter {
constructor(bucketSize, refillRate) {
this.bucketSize = bucketSize; // Max tokens
this.refillRate = refillRate; // Tokens per second
this.buckets = new Map(); // clientId -> { tokens, lastRefill }
}
isAllowed(clientId, tokensRequired = 1) {
const now = Date.now();
let bucket = this.buckets.get(clientId);
if (!bucket) {
bucket = {
tokens: this.bucketSize,
lastRefill: now,
};
}
// Calculate tokens to add based on time elapsed
const elapsed = (now - bucket.lastRefill) / 1000;
const tokensToAdd = elapsed * this.refillRate;
bucket.tokens = Math.min(this.bucketSize, bucket.tokens + tokensToAdd);
bucket.lastRefill = now;
if (bucket.tokens < tokensRequired) {
// Calculate when enough tokens will be available
const tokensNeeded = tokensRequired - bucket.tokens;
const waitTime = tokensNeeded / this.refillRate;
this.buckets.set(clientId, bucket);
return {
allowed: false,
remaining: Math.floor(bucket.tokens),
retryAfter: Math.ceil(waitTime),
};
}
bucket.tokens -= tokensRequired;
this.buckets.set(clientId, bucket);
return {
allowed: true,
remaining: Math.floor(bucket.tokens),
retryAfter: 0,
};
}
}
// Usage: Allow bursts of 50 requests, refill at 10/second
const limiter = new TokenBucketLimiter(50, 10);
Pros: Handles bursts gracefully, intuitive model
Cons: Requires tuning bucket size and refill rate
Production Express.js Rate Limiting Middleware
// middleware/rateLimiter.js
const rateLimit = require('express-rate-limit');
const RedisStore = require('rate-limit-redis');
const Redis = require('ioredis');
const redis = new Redis(process.env.REDIS_URL);
// Standard rate limiter for most endpoints
const standardLimiter = rateLimit({
store: new RedisStore({
sendCommand: (...args) => redis.call(...args),
}),
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100,
message: {
error: 'Too many requests',
message: 'Please try again later',
},
standardHeaders: true, // Return rate limit info in headers
legacyHeaders: false,
keyGenerator: (req) => {
// Use user ID if authenticated, otherwise IP
return req.user?.id || req.ip;
},
skip: (req) => {
// Skip rate limiting for health checks
return req.path === '/health';
},
});
// Strict limiter for authentication endpoints
const authLimiter = rateLimit({
store: new RedisStore({
sendCommand: (...args) => redis.call(...args),
}),
windowMs: 60 * 60 * 1000, // 1 hour
max: 5, // 5 attempts per hour
message: {
error: 'Too many login attempts',
message: 'Account temporarily locked. Try again later.',
},
standardHeaders: true,
legacyHeaders: false,
keyGenerator: (req) => {
// Rate limit by IP + email combination
const email = req.body?.email || 'unknown';
return `${req.ip}:${email}`;
},
});
// Expensive operations limiter (e.g., exports, reports)
const heavyLimiter = rateLimit({
store: new RedisStore({
sendCommand: (...args) => redis.call(...args),
}),
windowMs: 60 * 60 * 1000, // 1 hour
max: 10,
message: {
error: 'Export limit reached',
message: 'You can only export 10 times per hour',
},
standardHeaders: true,
legacyHeaders: false,
});
module.exports = { standardLimiter, authLimiter, heavyLimiter };
// Usage in Express app
const express = require('express');
const { standardLimiter, authLimiter, heavyLimiter } = require('./middleware/rateLimiter');
const app = express();
// Apply standard limiter to all API routes
app.use('/api/', standardLimiter);
// Apply strict limiter to auth routes
app.use('/api/auth/login', authLimiter);
app.use('/api/auth/register', authLimiter);
app.use('/api/auth/forgot-password', authLimiter);
// Apply heavy limiter to expensive operations
app.use('/api/reports/export', heavyLimiter);
app.use('/api/data/bulk-download', heavyLimiter);
Distributed Rate Limiting with Redis
For multi-server deployments, use Redis for shared state:
// Atomic sliding window counter with Redis Lua script
const Redis = require('ioredis');
const redis = new Redis(process.env.REDIS_URL);
const slidingWindowScript = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
local clearBefore = now - window
redis.call('ZREMRANGEBYSCORE', key, '-inf', clearBefore)
local count = redis.call('ZCARD', key)
if count < limit then
redis.call('ZADD', key, now, now .. '-' .. math.random())
redis.call('EXPIRE', key, math.ceil(window / 1000))
return {1, limit - count - 1}
else
return {0, 0}
end
`;
class RedisRateLimiter {
constructor(redis, windowMs, maxRequests, keyPrefix = 'ratelimit') {
this.redis = redis;
this.windowMs = windowMs;
this.maxRequests = maxRequests;
this.keyPrefix = keyPrefix;
}
async isAllowed(clientId) {
const key = `${this.keyPrefix}:${clientId}`;
const now = Date.now();
const result = await this.redis.eval(
slidingWindowScript,
1,
key,
now,
this.windowMs,
this.maxRequests
);
return {
allowed: result[0] === 1,
remaining: result[1],
};
}
}
// Usage
const limiter = new RedisRateLimiter(redis, 60000, 100);
app.use(async (req, res, next) => {
const clientId = req.user?.id || req.ip;
const result = await limiter.isAllowed(clientId);
res.set('X-RateLimit-Limit', '100');
res.set('X-RateLimit-Remaining', result.remaining.toString());
if (!result.allowed) {
return res.status(429).json({
error: 'Too many requests',
message: 'Please slow down',
});
}
next();
});
Spring Boot Rate Limiting with Bucket4j
// pom.xml dependency
<dependency>
<groupId>com.bucket4j</groupId>
<artifactId>bucket4j-core</artifactId>
<version>8.7.0</version>
</dependency>
// RateLimitingFilter.java
import io.github.bucket4j.*;
import jakarta.servlet.*;
import jakarta.servlet.http.*;
import org.springframework.stereotype.Component;
import java.io.IOException;
import java.time.Duration;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
@Component
public class RateLimitingFilter implements Filter {
private final Map<String, Bucket> buckets = new ConcurrentHashMap<>();
private Bucket createNewBucket() {
return Bucket.builder()
.addLimit(Bandwidth.builder()
.capacity(100)
.refillGreedy(100, Duration.ofMinutes(1))
.build())
.addLimit(Bandwidth.builder()
.capacity(10)
.refillGreedy(10, Duration.ofSeconds(1))
.build())
.build();
}
@Override
public void doFilter(
ServletRequest request,
ServletResponse response,
FilterChain chain) throws IOException, ServletException {
HttpServletRequest httpRequest = (HttpServletRequest) request;
HttpServletResponse httpResponse = (HttpServletResponse) response;
String clientId = getClientId(httpRequest);
Bucket bucket = buckets.computeIfAbsent(clientId, k -> createNewBucket());
ConsumptionProbe probe = bucket.tryConsumeAndReturnRemaining(1);
httpResponse.setHeader("X-RateLimit-Remaining",
String.valueOf(probe.getRemainingTokens()));
if (probe.isConsumed()) {
chain.doFilter(request, response);
} else {
long waitSeconds = probe.getNanosToWaitForRefill() / 1_000_000_000;
httpResponse.setHeader("Retry-After", String.valueOf(waitSeconds));
httpResponse.setStatus(429);
httpResponse.getWriter().write(
"{\"error\": \"Too many requests\", \"retryAfter\": " + waitSeconds + "}");
}
}
private String getClientId(HttpServletRequest request) {
// Try to get user ID from JWT token
String authHeader = request.getHeader("Authorization");
if (authHeader != null && authHeader.startsWith("Bearer ")) {
// Extract user ID from token (implement your JWT parsing)
return extractUserIdFromToken(authHeader.substring(7));
}
// Fall back to IP address
String forwarded = request.getHeader("X-Forwarded-For");
return forwarded != null ? forwarded.split(",")[0] : request.getRemoteAddr();
}
private String extractUserIdFromToken(String token) {
// Implement JWT parsing
return token.hashCode() + ""; // Placeholder
}
}
Rate Limit Response Headers
Always include these headers to help clients manage their requests:
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1704067200
Retry-After: 30
{
"error": "rate_limit_exceeded",
"message": "You have exceeded the rate limit. Please try again later.",
"retryAfter": 30
}
Tiered Rate Limits by Subscription
// Define limits per subscription tier
const tierLimits = {
free: {
requestsPerMinute: 20,
requestsPerDay: 1000,
burstSize: 5,
},
pro: {
requestsPerMinute: 100,
requestsPerDay: 10000,
burstSize: 20,
},
enterprise: {
requestsPerMinute: 1000,
requestsPerDay: 100000,
burstSize: 100,
},
};
// Middleware to apply tier-based limits
const tieredRateLimiter = async (req, res, next) => {
const user = req.user;
const tier = user?.subscription?.tier || 'free';
const limits = tierLimits[tier];
const clientId = user?.id || req.ip;
// Check minute limit
const minuteResult = await checkLimit(
`${clientId}:minute`,
60000,
limits.requestsPerMinute
);
if (!minuteResult.allowed) {
return res.status(429).json({
error: 'Rate limit exceeded',
limit: limits.requestsPerMinute,
remaining: 0,
resetIn: minuteResult.resetIn,
upgradeUrl: tier === 'free' ? '/pricing' : null,
});
}
// Check daily limit
const dayResult = await checkLimit(
`${clientId}:day`,
86400000,
limits.requestsPerDay
);
if (!dayResult.allowed) {
return res.status(429).json({
error: 'Daily limit exceeded',
limit: limits.requestsPerDay,
remaining: 0,
resetIn: dayResult.resetIn,
upgradeUrl: '/pricing',
});
}
res.set('X-RateLimit-Limit-Minute', limits.requestsPerMinute.toString());
res.set('X-RateLimit-Remaining-Minute', minuteResult.remaining.toString());
res.set('X-RateLimit-Limit-Day', limits.requestsPerDay.toString());
res.set('X-RateLimit-Remaining-Day', dayResult.remaining.toString());
next();
};
Common Mistakes to Avoid
1. Rate limiting only by IP address
// Wrong: All users behind a NAT share the same limit
const clientId = req.ip;
// Correct: Use authenticated user ID when available
const clientId = req.user?.id || req.ip;
2. Not handling the rate limit response on the client
// Wrong: Ignoring 429 responses
const response = await fetch('/api/data');
const data = await response.json();
// Correct: Handle rate limiting gracefully
const response = await fetch('/api/data');
if (response.status === 429) {
const retryAfter = response.headers.get('Retry-After');
await sleep(parseInt(retryAfter) * 1000);
return fetch('/api/data'); // Retry
}
3. Not providing Retry-After headers
// Wrong: Client doesn't know when to retry
res.status(429).json({ error: 'Too many requests' });
// Correct: Include retry information
res.set('Retry-After', '30');
res.status(429).json({
error: 'Too many requests',
retryAfter: 30,
});
4. Applying the same limits to all endpoints
// Wrong: Login and search have the same limit
app.use('/api/', rateLimiter);
// Correct: Different limits for different risk levels
app.use('/api/auth/login', strictLimiter); // 5/hour
app.use('/api/search', standardLimiter); // 100/min
app.use('/api/reports/export', heavyLimiter); // 10/hour
Strategy Comparison
| Strategy | Burst Tolerant | Accuracy | Memory | Complexity |
|---|---|---|---|---|
| Fixed Window | No | Low | O(1) | Low |
| Sliding Log | No | High | O(n) | Medium |
| Sliding Counter | Partial | Medium | O(1) | Medium |
| Token Bucket | Yes | High | O(1) | Medium |
| Leaky Bucket | Yes | High | O(n) | High |
Related
- Rate Limiting Strategies: Token Bucket, Leaky Bucket, and More
- Spring Boot + JWT: Secure Your REST API
- Deploying Spring Boot Apps to Docker and Kubernetes
Final Thoughts
API rate limiting is a non-negotiable in modern backend architecture. It's relatively simple to implement but can save your backend from downtime, abuse, and unpredictable costs. Start with a simple fixed window or token bucket approach, then evolve to more sophisticated strategies as your needs grow.
Remember to always provide clear feedback to clients about their rate limit status, implement different limits for different endpoint types, and use Redis or a similar distributed store when running multiple server instances.