System Design & Architecture

System Design Interview: Designing a URL Shortener

The URL shortener is one of the most common system design interview questions, and for good reason. It sounds deceptively straightforward — take a long URL, return a short one, redirect when someone visits it. However, the real challenge surfaces when you start asking follow-up questions. How do you generate unique short codes? How do you handle billions of redirects per day? What happens when a popular link gets shared on social media and traffic spikes by 100x in minutes?

This walkthrough approaches designing a URL shortener the way you would in a real system design discussion: starting with requirements, then working through the core design, and finally addressing the scaling challenges that separate a toy project from a production system. Along the way, you will encounter trade-offs in encoding strategies, database choices, caching layers, and read-heavy optimization.

Clarifying Requirements Before Designing

Before writing any code or drawing any diagrams, establish what the system needs to do. Skipping this step is one of the most common mistakes in system design interviews — and in real architecture discussions.

Functional Requirements

  • Shorten: Given a long URL, generate a unique short URL
  • Redirect: Given a short URL, redirect the user to the original long URL
  • Custom aliases (optional): Allow users to choose their own short code
  • Expiration (optional): URLs can expire after a configurable time period
  • Analytics (optional): Track click counts, referrers, and geographic data

Non-Functional Requirements

  • High availability: The redirect service must be available at all times — a broken short link erodes trust permanently
  • Low latency: Redirects should complete in under 100ms — users expect instant redirects
  • Scalability: The system should handle read-heavy traffic (100:1 read-to-write ratio is typical)
  • Durability: Once a short URL is created, it must work reliably for its entire lifetime

Back-of-the-Envelope Estimation

Estimations ground your design decisions in concrete numbers. Assume a service that handles roughly 100 million new URLs per month and a 100:1 read-to-write ratio.

Writes: 100 million / (30 days × 24 hours × 3600 seconds) ≈ ~40 URLs created per second

Reads: 40 × 100 = ~4,000 redirects per second

Storage: If each URL record takes roughly 500 bytes (short code, long URL, metadata), then 100 million URLs per month × 500 bytes = ~50 GB per month. Over 5 years, that totals approximately 3 TB — large but manageable for a single database with proper indexing strategies.

These numbers immediately tell you something important: the system is heavily read-dominant. Consequently, most of your optimization effort should focus on making redirects fast, not on speeding up URL creation.

Encoding Strategy: Generating Short Codes

The core technical decision in designing a URL shortener is how to convert a long URL into a short, unique code. There are three main approaches, each with distinct trade-offs.

Approach 1: Base62 Encoding with a Counter

Use an auto-incrementing counter and convert the number to a Base62 string (a-z, A-Z, 0-9). A 7-character Base62 string supports 62^7 ≈ 3.5 trillion unique URLs — more than enough for most use cases.

import string

ALPHABET = string.ascii_lowercase + string.ascii_uppercase + string.digits  # 62 chars

def encode_base62(number: int) -> str:
    if number == 0:
        return ALPHABET[0]

    result = []
    while number > 0:
        result.append(ALPHABET[number % 62])
        number //= 62

    return ''.join(reversed(result))

def decode_base62(short_code: str) -> int:
    number = 0
    for char in short_code:
        number = number * 62 + ALPHABET.index(char)
    return number

Why this works well: Base62 encoding is deterministic, reversible, and produces short codes. A counter value of 1 billion encodes to just 6 characters (15FTGf). Additionally, sequential IDs make database inserts efficient because they maintain index locality.

The downside: Sequential codes are predictable. An attacker can enumerate all URLs by incrementing the counter. If that concerns you, combine the counter with an offset or use a different approach entirely.

Approach 2: Hash-Based Generation

Hash the long URL with MD5 or SHA-256, then take the first 7 characters of the Base62-encoded hash.

import hashlib

def generate_hash_code(long_url: str) -> str:
    hash_bytes = hashlib.sha256(long_url.encode()).digest()
    # Convert first 8 bytes to integer, then Base62 encode
    number = int.from_bytes(hash_bytes[:8], byteorder='big')
    return encode_base62(number)[:7]

Why you might choose this: Hashing is stateless — you do not need a centralized counter. This makes it easier to generate codes across multiple servers without coordination. Furthermore, the same URL always produces the same hash, which naturally deduplicates.

The downside: Collisions. Two different URLs can produce the same 7-character code. You need a collision resolution strategy — typically appending a counter or rehashing with a salt until you find an unused code.

Approach 3: Pre-Generated Key Service

Generate a batch of unique random codes in advance and store them in a separate “key pool” database. When a user shortens a URL, pull the next available code from the pool.

Why this appeals to some teams: It completely separates code generation from URL creation, making both operations simpler. It also avoids the collision problem entirely.

The downside: You need a separate service to manage the key pool, and that service itself becomes a potential bottleneck and single point of failure. In practice, most teams find this adds complexity without proportional benefit.

Which Approach to Choose

For most production systems, Base62 encoding with a counter provides the best balance of simplicity, performance, and predictability. If you need non-sequential codes for security reasons, combine the counter with a bijective shuffle function that maps sequential IDs to seemingly random outputs. This gives you the insert performance of sequential IDs with the unpredictability of random codes.

Database Schema and Storage

The primary data model is straightforward: a mapping from short code to long URL, plus metadata.

CREATE TABLE urls (
    id BIGSERIAL PRIMARY KEY,
    short_code VARCHAR(10) NOT NULL UNIQUE,
    long_url TEXT NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    expires_at TIMESTAMP WITH TIME ZONE,
    user_id BIGINT,
    click_count BIGINT DEFAULT 0
);

CREATE UNIQUE INDEX idx_short_code ON urls (short_code);
CREATE INDEX idx_user_id ON urls (user_id) WHERE user_id IS NOT NULL;
CREATE INDEX idx_expires_at ON urls (expires_at) WHERE expires_at IS NOT NULL;

Why a Relational Database Works Here

A URL shortener’s core data model is a simple key-value lookup, which might suggest a NoSQL solution. However, a relational database like PostgreSQL handles this use case well for several reasons:

  • The schema is fixed and unlikely to change frequently
  • You need strong consistency — a short code must map to exactly one URL
  • Partial indexes (like the conditional index on expires_at) reduce index size significantly
  • Transactions simplify the “check if code exists, then insert” operation

For teams already running PostgreSQL, the performance tuning fundamentals you apply elsewhere transfer directly to this use case.

When to Consider NoSQL

If your system grows beyond what a single PostgreSQL instance can handle (hundreds of millions of writes per day), DynamoDB or Cassandra become viable alternatives. Both excel at single-key lookups and horizontal scaling. The trade-off is that you lose relational features like partial indexes and transactions, so your application code needs to handle more of the consistency logic.

API Design

The URL shortener needs two primary endpoints and one optional endpoint.

Create Short URL

POST /api/shorten
Content-Type: application/json

{
  "url": "https://example.com/very/long/path?with=parameters",
  "custom_alias": "my-link",    // optional
  "expires_in": 86400           // optional, seconds
}

Response (201 Created):
{
  "short_url": "https://short.io/Ab3xK9",
  "short_code": "Ab3xK9",
  "expires_at": "2026-04-01T00:00:00Z"
}

Redirect

GET /:short_code

Response (301 Moved Permanently):
Location: https://example.com/very/long/path?with=parameters

The choice between 301 (permanent) and 302 (temporary) redirects matters. A 301 tells browsers and search engines to cache the redirect, which reduces load on your servers. However, it also means analytics tracking becomes less accurate because subsequent visits bypass your server entirely. Most URL shorteners use 302 redirects to maintain accurate click tracking, then switch to 301 for URLs where analytics are not needed.

Delete / Expire URL

DELETE /api/urls/:short_code
Authorization: Bearer <token>

Response (204 No Content)

Apply rate limiting to the creation endpoint to prevent abuse. Without it, an attacker can exhaust your short code space or use your service for spam distribution.

Caching Layer: Making Redirects Fast

With a 100:1 read-to-write ratio, caching is not optional — it is the primary performance lever. A redirect that hits the database every time adds unnecessary latency and database load.

Cache Architecture

Place Redis between the API servers and the database. On every redirect request, check Redis first. If the short code exists in cache, return the long URL immediately. If not, query the database, cache the result, and then redirect.

import redis
import psycopg2
from typing import Optional

redis_client = redis.Redis(host='cache.internal', port=6379, db=0, decode_responses=True)
CACHE_TTL = 86400  # 24 hours

def resolve_short_code(short_code: str) -> Optional[str]:
    # Check cache first
    cached_url = redis_client.get(f"url:{short_code}")
    if cached_url:
        return cached_url

    # Cache miss — query database
    conn = psycopg2.connect(dsn="postgresql://localhost/urlshortener")
    cursor = conn.cursor()
    cursor.execute("SELECT long_url FROM urls WHERE short_code = %s", (short_code,))
    row = cursor.fetchone()
    cursor.close()
    conn.close()

    if row is None:
        return None

    long_url = row[0]
    # Populate cache for future requests
    redis_client.setex(f"url:{short_code}", CACHE_TTL, long_url)
    return long_url

Cache Hit Ratio Expectations

Popular URL shorteners report cache hit ratios above 90% because URL access follows a power-law distribution: a small percentage of URLs receive the vast majority of traffic. As a result, even a modestly sized Redis instance (a few GB) can serve most redirect traffic without touching the database.

For teams exploring what Redis can do beyond simple key-value caching, Redis data structures like sorted sets enable additional features like leaderboards of most-clicked URLs or time-windowed analytics.

Cache Invalidation

URL shortener caching is simpler than most cache invalidation problems because the data is essentially immutable. Once a short code maps to a long URL, that mapping never changes. The only cache invalidation scenarios are:

  • URL deletion: Remove the key from Redis when a user deletes their short URL
  • URL expiration: Set the Redis TTL to match the URL’s expiration time
  • Cache eviction: Let Redis LRU eviction handle stale entries naturally

This immutability is a significant architectural advantage. Unlike e-commerce product pages or social media feeds where data changes constantly, URL mappings are write-once-read-many. Consequently, you avoid the most painful cache invalidation scenarios entirely.

Scaling the Write Path

At 40 writes per second, a single database server handles URL creation without breaking a sweat. However, as the service grows, the write path introduces challenges.

The Counter Problem

If you use Base62 encoding with an auto-incrementing counter, the counter becomes a coordination point. Two servers generating URLs simultaneously cannot both read-increment-write the counter without risking duplicates.

Solution 1: Database sequences. PostgreSQL’s BIGSERIAL handles this natively. The database guarantees unique sequential IDs even under concurrent writes. This works well up to several thousand writes per second.

Solution 2: Range-based allocation. Assign each application server a range of IDs (e.g., Server A gets 1-1,000,000, Server B gets 1,000,001-2,000,000). Each server increments within its range independently. When a range is exhausted, the server requests a new one from a coordination service. This eliminates per-write coordination while maintaining sequential-ish IDs.

Solution 3: Snowflake-style IDs. Use a composite ID that includes a timestamp, machine ID, and sequence number. This generates unique IDs across multiple servers without any coordination. The trade-off is that the IDs are 64-bit integers, which encode to slightly longer Base62 strings (10-11 characters instead of 6-7).

For most URL shorteners, database sequences (Solution 1) work well until you reach thousands of writes per second. At that point, range-based allocation (Solution 2) offers the best balance of simplicity and scalability.

Scaling the Read Path

The read path is where you spend most of your scaling effort, since redirects vastly outnumber URL creations.

Multi-Tier Caching

For high-traffic deployments, a single Redis instance eventually becomes a bottleneck. Consider a multi-tier caching strategy:

  1. Application-level cache: An in-process LRU cache (limited to a few hundred MB) handles the hottest URLs without any network round-trip
  2. Redis cluster: A distributed Redis cluster handles the remaining cache-worthy URLs with sub-millisecond latency
  3. Database read replicas: Multiple PostgreSQL read replicas distribute database queries that miss both cache tiers
from functools import lru_cache
from typing import Optional

# Tier 1: In-process cache for the hottest URLs (no network hop)
@lru_cache(maxsize=10000)
def get_from_local_cache(short_code: str) -> Optional[str]:
    return None  # Populated on first miss

def resolve_with_tiered_cache(short_code: str) -> Optional[str]:
    # Tier 1: Local process cache
    local_result = get_from_local_cache(short_code)
    if local_result:
        return local_result

    # Tier 2: Redis
    cached_url = redis_client.get(f"url:{short_code}")
    if cached_url:
        get_from_local_cache.cache_clear()  # Simplified; production would use TTL-aware eviction
        return cached_url

    # Tier 3: Database
    long_url = query_database(short_code)
    if long_url:
        redis_client.setex(f"url:{short_code}", CACHE_TTL, long_url)
    return long_url

Database Read Replicas

When database queries are necessary, distribute them across read replicas. Since URL creation only happens on the primary, and the read path only needs to look up existing URLs, replication lag is rarely a concern. A newly created URL might not appear on replicas for a few hundred milliseconds, but users typically do not click their short URL within that window.

For applications managing high-throughput database connections, connection pooling with PgBouncer prevents connection exhaustion under sudden traffic spikes.

Analytics: Tracking Clicks Without Slowing Redirects

If you add click analytics, the redirect path must not slow down. Inserting a row into an analytics table on every redirect would add latency and create a write bottleneck on the hottest path in the system.

Asynchronous Analytics Pipeline

Decouple analytics from the redirect path by publishing click events to a message queue.

import json
import time
from datetime import datetime, timezone

def handle_redirect(short_code: str, request) -> str:
    long_url = resolve_with_tiered_cache(short_code)

    if long_url is None:
        return None  # 404

    # Fire-and-forget: publish click event to message queue
    click_event = {
        "short_code": short_code,
        "timestamp": datetime.now(timezone.utc).isoformat(),
        "referrer": request.headers.get("Referer", ""),
        "user_agent": request.headers.get("User-Agent", ""),
        "ip_country": resolve_country(request.remote_addr),
    }
    redis_client.lpush("click_events", json.dumps(click_event))

    return long_url

A separate worker process consumes events from the queue and writes them to an analytics database in batches. This approach keeps redirect latency under a few milliseconds while still capturing every click.

Aggregated vs Raw Analytics

Storing every individual click event gets expensive at scale. A more practical approach is to maintain both:

  • Real-time counters: Increment a Redis counter per short code for live click counts
  • Aggregated summaries: Periodically roll up hourly/daily click counts, top referrers, and geographic breakdowns into summary tables
  • Raw events: Store raw click events in a cheaper storage layer (S3 + Athena, or ClickHouse) for ad-hoc analysis

This tiered approach balances real-time visibility with long-term storage costs.

A marketing team at a mid-sized e-commerce company uses an internal URL shortener for campaign tracking. They create around 500 short URLs per day — well within the system’s design capacity. Then one of their product links gets shared by an influencer with several million followers.

Within 30 minutes, redirect traffic jumps from the baseline of roughly 50 requests per second to over 10,000 requests per second. The system reacts across its tiers:

Cache layer: Because the viral URL is a single short code, Redis serves the overwhelming majority of requests. The cache hit ratio for this specific URL reaches effectively 100% after the first request. The Redis instance handles 10,000 reads per second without significant CPU pressure because each operation is a simple key lookup.

Application servers: The load balancer distributes requests across multiple application instances. Each instance resolves the redirect from its local in-process cache after the first request, meaning most redirects complete without even hitting Redis.

Database: Almost no additional database load occurs because the cache handles everything. The database team notices no change in query latency during the spike.

Analytics pipeline: The click event queue grows rapidly, but because analytics processing is asynchronous, the redirect latency remains unaffected. The analytics workers fall behind temporarily but catch up within an hour after the traffic spike subsides.

The key lesson is that the system’s read path scales independently of its write path. Because the architecture separates concerns — caching for speed, async processing for analytics, database for durability — a 200x traffic spike on one URL does not degrade the experience for other users.

When to Use This Architecture

  • System design interviews where the interviewer asks you to design a URL shortener, link shortener, or paste-bin service
  • Building an internal URL shortener for campaign tracking or analytics
  • Any service that maps short identifiers to longer resources (short IDs for documents, invite codes, file sharing links)
  • Learning system design fundamentals: URL shorteners touch hashing, caching, databases, scaling, and API design in a compact problem

When NOT to Use This Exact Design

  • If you need fewer than 10,000 URLs total, a simple database table with no caching layer is sufficient — do not over-engineer
  • If you only need URL shortening for internal tools, managed services like Bitly or short.io cost less than running your own infrastructure
  • If your primary concern is link security (preventing enumeration, access control per link), you need a fundamentally different security model, not just a different encoding strategy

Common Mistakes When Designing a URL Shortener

  • Jumping into database schema without clarifying requirements and estimating scale first — the numbers should drive your design decisions
  • Choosing a hash-based approach without planning for collision handling, then discovering duplicates in production
  • Using 301 (permanent) redirects when you need analytics, causing browsers to cache the redirect and bypass your tracking
  • Skipping the caching layer because “the database can handle it” — this works at small scale but becomes a bottleneck surprisingly quickly
  • Storing raw click events indefinitely without aggregation, leading to storage costs that grow linearly with traffic
  • Not rate-limiting the creation endpoint, allowing abuse that exhausts your short code namespace or turns your service into a spam distribution tool
  • Over-designing for scale you will never reach — a URL shortener serving a single company does not need Cassandra, Kafka, and a microservices architecture

Completing the URL Shortener Design

Designing a URL shortener covers a remarkable breadth of system design concepts in a single problem. The core system is a key-value mapping with Base62 encoding for short code generation, a relational database for storage, and a Redis caching layer for fast redirects. The scaling strategy prioritizes the read path because redirects outnumber URL creations by orders of magnitude, and multi-tier caching combined with database read replicas handles traffic spikes gracefully.

Start your next system design practice by estimating the numbers first — writes per second, reads per second, storage over time. Those estimates drive every subsequent decision. Once you have the fundamentals of designing a URL shortener down, expand your practice to related problems like rate limiter design and distributed caching, which build directly on the same patterns.

Leave a Comment