System Design & Architecture

System Design Interview: Designing a Chat Application

Real-time messaging is one of the most demanding system design problems because it combines persistent connections, low-latency delivery, message ordering, and offline support into a single system. Designing a chat app forces you to think about problems that most CRUD applications never encounter: how do you push messages to users instead of waiting for them to ask? How do you guarantee that messages arrive in order? What happens when a user is offline for three days and then reconnects?

This walkthrough covers designing a chat app step by step — from requirements and connection management through message storage, delivery guarantees, and the scaling decisions that determine whether the system handles a thousand users or a hundred million. Whether you are preparing for a system design interview or planning a real messaging feature, the patterns here apply directly.

Clarifying Requirements First

Chat applications range from simple one-on-one messaging to full platforms with group chats, media sharing, read receipts, and end-to-end encryption. Before designing anything, establish the scope.

Functional Requirements

  • One-on-one messaging: Two users can send text messages to each other in real time
  • Group messaging: Multiple users can participate in a shared conversation
  • Online/offline presence: Users can see whether their contacts are currently online
  • Message history: Users can scroll back through previous messages
  • Delivery indicators: Sent, delivered, and read status for each message
  • Push notifications: Offline users receive notifications for new messages

Non-Functional Requirements

  • Low latency: Messages should arrive within 200ms under normal conditions
  • Message ordering: Messages within a conversation must appear in the correct order
  • Reliability: No messages should be lost, even during server failures
  • Scalability: The system should support millions of concurrent connections
  • Eventual consistency: Minor delays in delivery status updates are acceptable

Scale Estimation

Assume a chat application with 50 million daily active users where each user sends an average of 40 messages per day.

Messages per day: 50 million × 40 = 2 billion messages

Messages per second: 2 billion / 86,400 ≈ ~23,000 messages per second

Concurrent connections: If 10% of daily active users are online simultaneously, that means roughly 5 million persistent WebSocket connections

Storage: If the average message is 200 bytes of text plus 300 bytes of metadata, each message takes approximately 500 bytes. Over one year, 2 billion messages per day × 365 days × 500 bytes ≈ 365 TB. Consequently, your storage strategy needs to account for both volume and access patterns.

These numbers shape every subsequent decision. Five million concurrent connections cannot be handled by a single server. Additionally, 23,000 messages per second requires a message routing layer that distributes work across multiple machines.

Connection Management with WebSockets

Traditional HTTP follows a request-response pattern: the client asks, the server answers. Chat requires the opposite — the server needs to push messages to clients the moment they arrive. WebSockets and Server-Sent Events both support server-initiated communication, but WebSockets are the standard choice for chat because they provide full-duplex communication on a single TCP connection.

How the Connection Works

  1. The client establishes a WebSocket connection to a chat server
  2. The server authenticates the connection and registers it in a connection registry
  3. Both sides can now send messages at any time without new HTTP requests
  4. The connection stays open until the user disconnects or the network drops
import { WebSocketServer } from 'ws';
import { verifyToken } from './auth.js';

const wss = new WebSocketServer({ port: 8080 });

// In-memory map: userId -> WebSocket connection
const connections = new Map();

wss.on('connection', async (ws, req) => {
  const token = new URL(req.url, 'http://localhost').searchParams.get('token');
  const user = await verifyToken(token);

  if (!user) {
    ws.close(4001, 'Unauthorized');
    return;
  }

  connections.set(user.id, ws);
  broadcastPresence(user.id, 'online');

  ws.on('message', (data) => {
    const message = JSON.parse(data);
    handleIncomingMessage(user.id, message);
  });

  ws.on('close', () => {
    connections.delete(user.id);
    broadcastPresence(user.id, 'offline');
  });
});

For teams building their first WebSocket-based chat server in Node.js, the connection setup above follows the same foundational pattern. However, a production chat system needs to distribute connections across many servers, which introduces the routing challenge discussed later.

Connection Registry

A single server can typically handle 50,000 to 100,000 concurrent WebSocket connections, depending on hardware and message throughput. For 5 million concurrent users, you need at least 50 to 100 WebSocket servers. This means that when User A sends a message to User B, they are likely connected to different servers.

The connection registry solves this by tracking which user is connected to which server. Redis works well for this purpose because it provides fast key-value lookups with automatic expiration.

import redis
import json

redis_client = redis.Redis(host='registry.internal', port=6379, decode_responses=True)
CONNECTION_TTL = 300  # 5 minutes, refreshed by heartbeats

def register_connection(user_id: str, server_id: str):
    redis_client.setex(
        f"conn:{user_id}",
        CONNECTION_TTL,
        json.dumps({"server_id": server_id, "connected_at": time.time()})
    )

def find_user_server(user_id: str) -> str | None:
    data = redis_client.get(f"conn:{user_id}")
    if data:
        return json.loads(data)["server_id"]
    return None

def refresh_heartbeat(user_id: str):
    redis_client.expire(f"conn:{user_id}", CONNECTION_TTL)

Each WebSocket server sends periodic heartbeats for its connected users, refreshing the TTL. If a server crashes, its connections expire from the registry automatically within 5 minutes.

Message Flow: From Send to Deliver

Understanding the complete lifecycle of a message is essential when designing a chat app. Here is what happens when User A sends a message to User B:

  1. User A’s client sends the message over their WebSocket connection to Chat Server 1
  2. Chat Server 1 validates the message, assigns a server-side timestamp and message ID, then persists it to the message database
  3. Chat Server 1 looks up User B’s connection in the registry — User B is connected to Chat Server 3
  4. Chat Server 1 publishes the message to a message routing layer (Redis Pub/Sub or a message queue)
  5. Chat Server 3 receives the routed message and pushes it to User B’s WebSocket connection
  6. User B’s client acknowledges receipt, and Chat Server 3 updates the delivery status
import uuid
from datetime import datetime, timezone

async def handle_incoming_message(sender_id: str, payload: dict):
    message = {
        "id": str(uuid.uuid4()),
        "conversation_id": payload["conversation_id"],
        "sender_id": sender_id,
        "content": payload["content"],
        "timestamp": datetime.now(timezone.utc).isoformat(),
        "status": "sent",
    }

    # Step 1: Persist to database (guarantees durability)
    await save_message(message)

    # Step 2: Send delivery acknowledgment back to sender
    await send_to_user(sender_id, {
        "type": "message_ack",
        "message_id": message["id"],
        "status": "sent",
    })

    # Step 3: Route to recipient(s)
    recipients = await get_conversation_members(message["conversation_id"])
    for recipient_id in recipients:
        if recipient_id == sender_id:
            continue
        await route_message(recipient_id, message)

Why Persist Before Routing

The message hits the database before any delivery attempt. This ordering matters because network delivery can fail — the recipient might disconnect between the routing lookup and the actual push. By persisting first, you guarantee that the message is never lost. If delivery fails, the recipient retrieves the message when they reconnect and sync their message history.

Message Storage Design

Chat message storage has unique access patterns that differ from typical application data. Users primarily access messages in reverse chronological order within a single conversation, and they rarely search across all conversations at once.

Schema Design

CREATE TABLE messages (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    conversation_id UUID NOT NULL,
    sender_id BIGINT NOT NULL,
    content TEXT NOT NULL,
    message_type VARCHAR(20) DEFAULT 'text',
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    status VARCHAR(20) DEFAULT 'sent'
);

-- Primary query: "get recent messages in conversation X"
CREATE INDEX idx_conv_time ON messages (conversation_id, created_at DESC);

CREATE TABLE conversations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    type VARCHAR(20) NOT NULL,  -- 'direct' or 'group'
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    last_message_at TIMESTAMP WITH TIME ZONE
);

CREATE TABLE conversation_members (
    conversation_id UUID REFERENCES conversations(id),
    user_id BIGINT NOT NULL,
    joined_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    last_read_at TIMESTAMP WITH TIME ZONE,
    PRIMARY KEY (conversation_id, user_id)
);

CREATE INDEX idx_user_conversations ON conversation_members (user_id, conversation_id);

The compound index on (conversation_id, created_at DESC) is the most critical index in the system. It enables efficient pagination through message history — the query pattern that users execute most frequently. For more on how compound indexes serve these query patterns, see database indexing strategies.

Pagination Strategy

Users load messages in pages as they scroll up through history. Cursor-based pagination outperforms offset-based pagination for this use case because message volume grows continuously.

-- First page: latest 50 messages
SELECT id, sender_id, content, created_at
FROM messages
WHERE conversation_id = $1
ORDER BY created_at DESC
LIMIT 50;

-- Next page: 50 messages before the cursor
SELECT id, sender_id, content, created_at
FROM messages
WHERE conversation_id = $1
  AND created_at < $2  -- cursor: created_at of last message from previous page
ORDER BY created_at DESC
LIMIT 50;

When to Consider Cassandra or ScyllaDB

For chat applications at massive scale (hundreds of millions of users), a wide-column store like Cassandra becomes compelling. Cassandra’s partition key design maps naturally to the chat access pattern: partition by conversation_id, cluster by created_at. This co-locates all messages in a conversation on the same partition, making page queries extremely fast. Furthermore, Cassandra scales horizontally by adding nodes, which avoids the vertical scaling ceiling of a single PostgreSQL instance.

The trade-off is that Cassandra requires more operational expertise and does not support transactions or joins. For most applications with fewer than 10 million users, PostgreSQL with proper indexing handles the message volume comfortably.

Message Routing Across Servers

When sender and recipient connect to different WebSocket servers, you need a routing mechanism to forward messages between servers. Two patterns dominate this space.

Pattern 1: Redis Pub/Sub

Each WebSocket server subscribes to a Redis Pub/Sub channel for every user it currently handles. When a message arrives for User B, the sending server publishes to User B’s channel. The server holding User B’s connection receives the message and pushes it to the client.

import redis
import threading

pubsub_client = redis.Redis(host='pubsub.internal', port=6379)

def subscribe_to_user_channel(user_id: str, callback):
    pubsub = pubsub_client.pubsub()
    pubsub.subscribe(f"user:{user_id}")

    def listen():
        for message in pubsub.listen():
            if message["type"] == "message":
                callback(json.loads(message["data"]))

    thread = threading.Thread(target=listen, daemon=True)
    thread.start()
    return pubsub

def route_message(recipient_id: str, message: dict):
    pubsub_client.publish(f"user:{recipient_id}", json.dumps(message))

Redis Pub/Sub works well for chat routing because it delivers messages with minimal latency and requires no persistent storage. However, messages published when no subscriber is listening are lost — which is acceptable because you already persisted the message to the database before routing.

Pattern 2: Dedicated Message Queue

For larger deployments, a message queue like Kafka provides stronger ordering guarantees and replay capability. Each conversation can map to a Kafka partition, which ensures that messages within a conversation arrive in order. Teams already using event-driven architecture with Kafka can extend their existing infrastructure to handle chat message routing.

Which Pattern to Choose

Redis Pub/Sub suits most chat applications up to a few million concurrent users. It is simpler to operate, has lower latency, and does not require managing topics and partitions. Kafka becomes worthwhile when you need strict message ordering across distributed consumers, replay capability for debugging, or integration with a broader event-driven architecture. For a system design interview, Redis Pub/Sub is typically the better starting point because it is simpler to explain and defend.

Presence and Typing Indicators

Presence (online/offline status) and typing indicators are features users expect, but they generate far more events than actual messages. A user might type and delete text multiple times before sending one message. Consequently, presence and typing events need different handling than chat messages.

Presence Tracking

Track presence using heartbeats rather than connection events alone. Network interruptions can disconnect a WebSocket without triggering the close event. A heartbeat-based approach catches these silent disconnections.

PRESENCE_TTL = 60  # seconds

def update_presence(user_id: str):
    redis_client.setex(f"presence:{user_id}", PRESENCE_TTL, "online")

def check_presence(user_id: str) -> str:
    return "online" if redis_client.exists(f"presence:{user_id}") else "offline"

def get_online_contacts(user_id: str, contact_ids: list[str]) -> list[str]:
    pipeline = redis_client.pipeline()
    for contact_id in contact_ids:
        pipeline.exists(f"presence:{contact_id}")
    results = pipeline.execute()
    return [cid for cid, online in zip(contact_ids, results) if online]

Each client sends a heartbeat every 30 seconds. If the server misses two consecutive heartbeats (60 seconds), the user’s presence key expires and they appear offline. This approach tolerates brief network hiccups without flapping the presence indicator.

Typing Indicators

Typing indicators should use fire-and-forget delivery — they are ephemeral events that do not need persistence or guaranteed delivery. If a typing indicator gets dropped, the user experience degrades slightly but no data is lost. Publish typing events directly through Redis Pub/Sub without touching the database.

Delivery Guarantees: Sent, Delivered, Read

Chat applications track three message states:

  • Sent: The server has received and persisted the message
  • Delivered: The recipient’s device has received the message
  • Read: The recipient has opened the conversation containing the message

Implementation

async def acknowledge_delivery(user_id: str, message_ids: list[str]):
    # Batch update delivery status
    await update_message_status(message_ids, "delivered")

    # Notify the sender that their messages were delivered
    for msg in await get_messages(message_ids):
        await route_message(msg["sender_id"], {
            "type": "status_update",
            "message_id": msg["id"],
            "status": "delivered",
        })

async def mark_as_read(user_id: str, conversation_id: str, read_up_to: str):
    # Update the member's last_read_at timestamp
    await update_last_read(conversation_id, user_id, read_up_to)

    # Notify other participants
    members = await get_conversation_members(conversation_id)
    for member_id in members:
        if member_id != user_id:
            await route_message(member_id, {
                "type": "read_receipt",
                "conversation_id": conversation_id,
                "reader_id": user_id,
                "read_up_to": read_up_to,
            })

The “read” status uses a watermark approach — instead of tracking read status per message, track the timestamp up to which the user has read. This reduces storage significantly compared to maintaining a read flag on every individual message. In a group conversation with 100 participants and thousands of messages, per-message read tracking would create an enormous amount of status data.

Offline Message Handling

When a recipient is offline, messages must be queued and delivered when they reconnect. The database already stores all messages, so the offline sync process queries for undelivered messages.

async def handle_reconnection(user_id: str):
    # Find all conversations this user belongs to
    conversations = await get_user_conversations(user_id)

    for conv in conversations:
        # Get messages sent after the user's last seen timestamp
        unread_messages = await get_messages_since(
            conv["conversation_id"],
            conv["last_read_at"]
        )

        # Push all missed messages to the reconnected client
        for msg in unread_messages:
            await send_to_user(user_id, {
                "type": "message",
                "data": msg,
            })

    # Acknowledge delivery for all synced messages
    await acknowledge_delivery(user_id, [m["id"] for m in unread_messages])

For users offline for extended periods, the sync payload can be large. Implement a progressive sync strategy: send the most recent page of each conversation first, then let the client request older history as the user scrolls.

Real-World Scenario: Scaling a Team Chat Feature

A B2B SaaS platform with around 5,000 business accounts adds an internal team chat feature. Each account has 10 to 200 members, and the platform serves roughly 100,000 daily active chat users in total.

Initially, the team deploys a single WebSocket server behind a load balancer with sticky sessions. PostgreSQL handles message storage, and Redis Pub/Sub routes messages between connections on the same server. At this scale — roughly 10,000 concurrent connections and a few hundred messages per second — the single-server architecture works reliably.

After six months, usage grows to 300,000 daily active chat users. The single WebSocket server starts dropping connections under peak load. The team horizontally scales to four WebSocket servers, introduces the Redis-based connection registry, and uses Redis Pub/Sub for cross-server message routing. The database handles the increased write volume without issues because the compound index on conversation_id and created_at keeps query performance consistent.

The most surprising challenge during the scaling transition is not message routing — it is presence tracking. With four servers, presence updates from different servers occasionally conflict, causing users to appear as briefly offline when they are actively chatting. The team resolves this by switching from connection-event-based presence to heartbeat-based presence with a 60-second TTL, which eliminates the flapping behavior.

When to Use This Architecture

  • System design interviews asking for a chat system, messaging platform, or real-time collaboration tool
  • Adding a chat or messaging feature to an existing application
  • Building a notification delivery system (similar push mechanics, simpler message model)
  • Any application where the server needs to push updates to clients in real time with delivery guarantees

When NOT to Use This Architecture

  • If you need only simple notifications without conversations or threading, Server-Sent Events with a database-backed queue are simpler and sufficient
  • If your chat feature serves fewer than 1,000 concurrent users, a single WebSocket server with in-memory routing avoids the complexity of distributed connection registries entirely
  • If you need end-to-end encryption as a core requirement, the server-side message persistence model described here conflicts with E2E encryption — you need a fundamentally different approach where the server stores encrypted blobs it cannot read

Common Mistakes When Designing a Chat App

  • Not persisting messages before attempting delivery, which causes message loss when the recipient’s connection drops during routing
  • Using HTTP polling instead of WebSockets for real-time delivery, adding unnecessary latency and server load
  • Tracking read status per individual message instead of using a watermark, creating a storage explosion in group conversations
  • Ignoring connection registry TTLs, which causes stale entries when servers crash and leads to messages routed to dead connections
  • Designing the system around strong consistency when eventual consistency is acceptable for presence and delivery status — the overhead is not worth it
  • Building group chat by iterating through members and sending individual messages synchronously, which makes group delivery latency proportional to group size
  • Not separating presence and typing events from message delivery, causing ephemeral events to overload the message storage layer

Completing the Chat App Design

Designing a chat app challenges you to balance real-time delivery, message durability, and horizontal scalability within a single system. The foundation is WebSocket connections for bidirectional communication, a database for persistent message storage, and Redis for both connection registry and cross-server message routing. Delivery guarantees come from persisting before routing, and offline support comes naturally from the database-first approach.

The most valuable takeaway when designing a chat app for an interview is demonstrating that you understand the trade-offs: Redis Pub/Sub versus Kafka for routing, PostgreSQL versus Cassandra for storage, heartbeat-based versus connection-based presence. Each choice has a defensible rationale at different scales. Start with the simplest option that meets your requirements, and explain clearly when and why you would evolve to something more complex.

Leave a Comment