Node.js Deep Dives

Node.js Memory Leaks: Detection and Prevention

If your Node.js service works fine on day one and then restarts itself at 3 a.m. every third day, you are probably chasing a memory leak. Node.js memory leaks rarely crash immediately. Instead, they slowly eat heap space until V8 gives up, the container hits its limit, or the orchestrator kills the process. This guide explains how to detect Node.js memory leaks, which patterns cause most of them, and how to prevent the common traps before they reach production.

This post is written for backend engineers running Express, NestJS, Fastify, or plain Node.js services under real traffic. By the end, you will know which tool to reach for first, how to read a heap snapshot, and which code patterns to stop writing today.

What Causes Node.js Memory Leaks?

A Node.js memory leak happens when your process keeps references to objects that are no longer needed, so the V8 garbage collector cannot reclaim them. The heap grows over time, response latency increases as the GC works harder, and eventually the process crashes with an out-of-memory error or gets killed by the kernel. The root cause is almost always a reference you forgot about, not a bug in Node.js itself.

This matters because the fix is usually in your application code. Therefore, the diagnostic workflow is less about tuning V8 flags and more about finding which object retains which other objects.

Symptoms That Suggest a Memory Leak

Before reaching for a profiler, confirm you actually have a leak. For instance, these are the usual signals:

  • Resident memory (RSS) trends upward over hours or days, never stabilizing
  • Latency degrades slowly even though request volume is flat
  • Process restarts itself with FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed
  • Kubernetes or ECS kills the container with OOMKilled while CPU is normal
  • Garbage collection pauses (visible in --trace-gc output) grow longer over time

If the memory graph is flat and your process only crashes under traffic spikes, you likely have a capacity or concurrency problem, not a leak. Furthermore, a short-lived process that restarts every deploy will never show leak symptoms even if the code has one.

How to Detect Node.js Memory Leaks

Use this sequence. Each step rules out cheaper possibilities before escalating.

  1. Enable garbage collection tracing with node --trace-gc server.js and watch whether scavenge and mark-sweep events free memory over time. If old-generation memory keeps growing after every mark-sweep, the heap is retaining something.
  2. Graph RSS over time using your APM (Datadog, New Relic, Grafana) or process.memoryUsage().heapUsed logged every minute. A healthy service oscillates; a leaking one drifts upward.
  3. Capture heap snapshots at three points: right after startup, after one hour of traffic, and after several hours. Use node --inspect plus Chrome DevTools, or call v8.writeHeapSnapshot() from a debug endpoint.
  4. Diff the snapshots in Chrome DevTools using the “Comparison” view. Sort by “Delta” and look for object types whose count only grows.
  5. Follow the retainer chain from a suspicious object back to a GC root. The retainer chain tells you which variable, closure, or cache is holding the reference.
  6. Reproduce locally with a load test (autocannonk6, or ab) and repeat steps 3-5. Production-only bugs are real, but most leaks reproduce under sustained local load.

For a broader view of profiling techniques across runtimes, see Profiling CPU and Memory Usage in Python, Node.js, and Java Apps.

Common Memory Leak Patterns in Node.js

Most Node.js memory leaks come from a short list of recurring patterns. The table below summarizes them.

PatternWhat Goes WrongTypical Fix
Unbounded in-memory cacheMap or {} grows forever as keys accumulateUse lru-cache with size and TTL limits
Event listener accumulation.on() called inside a request handler without .off()Register listeners once or use .once()
Closure capturesLarge objects retained by a closure passed to a long-lived callbackNull the reference after use or pass only what is needed
Global arrays used as queuesArray push() without corresponding shift() or drainUse a bounded queue or move to Redis
Timers with outer referencessetInterval callback references a large object from scopeClear with clearInterval on shutdown or context end
Promise retentionLong-lived Promise chains holding onto contextAvoid storing pending promises on shared objects
Streams not consumedReadable streams paused mid-flight, buffer growsPipe to a consumer or call .destroy() on error

1. Unbounded Caches

This is the most common pattern. The code usually looks innocent:

const responseCache = new Map();

async function getUser(userId) {
  if (responseCache.has(userId)) {
    return responseCache.get(userId);
  }
  const user = await db.users.findById(userId);
  responseCache.set(userId, user);
  return user;
}

The cache never evicts, so every unique user ID adds an entry that survives for the life of the process. Under real traffic with millions of users, this is a slow-motion OOM. The fix is a bounded cache:

import { LRUCache } from 'lru-cache';

const responseCache = new LRUCache({
  max: 10_000,
  ttl: 1000 * 60 * 5, // 5 minutes
});

async function getUser(userId) {
  const cached = responseCache.get(userId);
  if (cached) return cached;

  const user = await db.users.findById(userId);
  responseCache.set(userId, user);
  return user;
}

The bounded cache has a maximum size and a TTL, so old entries are evicted automatically. As a result, the heap stabilizes under steady load.

2. Event Listener Accumulation

Node’s EventEmitter warns you when a listener count crosses ten, but only once. Therefore, it is easy to miss in production logs.

// Bad: listener added per request, never removed
app.get('/subscribe', (req, res) => {
  eventBus.on('message', (msg) => {
    res.write(`data: ${JSON.stringify(msg)}\n\n`);
  });
});

Each request adds a permanent listener. Consequently, after a few thousand requests the eventBus holds thousands of closures, each capturing res. The fix is to remove the listener when the request ends:

app.get('/subscribe', (req, res) => {
  const handler = (msg) => {
    res.write(`data: ${JSON.stringify(msg)}\n\n`);
  };
  eventBus.on('message', handler);

  req.on('close', () => {
    eventBus.off('message', handler);
  });
});

3. Timers Holding References

setInterval callbacks keep everything in their closure alive until clearInterval is called. Long-running workers often forget this during graceful shutdown.

// Bad: timer survives even after the worker is "done"
class ReportWorker {
  constructor(data) {
    this.data = data; // large object
    this.timer = setInterval(() => this.checkStatus(), 5000);
  }
}

Instances pile up because the interval keeps each one reachable. Add an explicit stop() method and call it when work finishes:

class ReportWorker {
  constructor(data) {
    this.data = data;
    this.timer = setInterval(() => this.checkStatus(), 5000);
  }

  stop() {
    clearInterval(this.timer);
    this.data = null;
  }
}

A Realistic Debugging Scenario

Consider a mid-sized Express API handling a few hundred requests per second. The service runs fine for six to eight hours, then latency climbs and Kubernetes restarts the pod with OOMKilled. CPU usage never spikes, and error rates are normal until the kill.

The engineer on call takes heap snapshots at startup, after two hours, and after four hours. In the Chrome DevTools comparison view, one object type stands out: Response objects from Node’s HTTP module, growing by tens of thousands per hour. Following the retainer chain, each Response is held by a closure inside a listener registered on a shared EventEmitter used for a Server-Sent Events endpoint.

The fix is the pattern from earlier in this post: remove the listener on req.on('close'). After deployment, RSS stabilizes around 300 MB instead of drifting toward the 1 GB container limit. Importantly, the team also adds an alert on heapUsed trending upward for more than two hours, so the next leak surfaces before a pager goes off.

This scenario illustrates the general shape of a Node.js memory leak investigation: confirm the trend, diff snapshots, follow retainers, fix the pattern, and add monitoring so the next one is cheaper to catch.

Preventing Memory Leaks Before They Ship

Detection matters, but prevention is cheaper. Therefore, make these checks part of the normal development workflow:

  • Review every long-lived cache during code review. If it has no size limit or TTL, flag it.
  • Audit every EventEmitter.on() call. Confirm there is a matching off() or that the emitter lifetime matches the listener’s intended lifetime.
  • Run load tests with memory assertions. Tools like autocannon combined with process.memoryUsage() sampling catch leaks that unit tests miss.
  • Set container memory limits lower than you think you need. A 512 MB limit forces leaks to surface in staging instead of production.
  • Use heap snapshot diffing in CI for services that handle large state. Even a weekly scheduled snapshot diff can catch regressions.

For services that spawn CPU-intensive work, be especially careful with references crossing thread boundaries. See Node.js Worker Threads: Handling CPU-Intensive Tasks for patterns that keep worker lifetimes explicit.

When to Use Heap Snapshots

  • You have confirmed that RSS grows over time and does not plateau
  • You can reproduce the growth under load, locally or in staging
  • You need to identify which object type or retainer chain is at fault
  • You are investigating a regression after a specific deploy

When NOT to Chase a Suspected Memory Leak

  • Memory climbs during a traffic spike and then falls back to baseline afterward
  • The process is short-lived (serverless function, CLI tool) and never runs long enough to leak meaningfully
  • You see high RSS but heapUsed is stable — this is usually native memory or buffer pooling, not a JS leak
  • The service is memory-bound by legitimate caches or data structures that a profile confirms as intended
  • A deploy just happened and the new version has not run long enough to show a trend

In these cases, start with capacity planning or concurrency tuning instead. For serverless specifically, cold start memory behavior differs from long-running processes — see Serverless Node.js on AWS Lambda: Patterns and Pitfalls.

Common Mistakes with Memory Leak Debugging

  • Taking a single heap snapshot instead of comparing two over time — a snapshot by itself rarely tells you anything
  • Confusing heapUsed with RSS; native buffers and shared library memory live outside the V8 heap
  • Blaming the framework before checking application-level caches and listeners
  • Forgetting that --max-old-space-size only raises the ceiling; it does not fix the leak
  • Using global.gc() in production code to “force cleanup” — this hides the symptom and adds latency
  • Rolling restarts as a permanent fix; the leak comes back after every deploy
  • Profiling with --inspect on a production node without load balancer drain, causing real user impact

Conclusion

Most Node.js memory leaks come from a small set of patterns: unbounded caches, forgotten event listeners, timers holding references, and closures capturing more than they need. Detecting them is a disciplined workflow — trace garbage collection, graph RSS, diff heap snapshots, and follow retainer chains — not a mysterious art. Start by auditing your largest caches and every EventEmitter.on() call in long-lived code paths; that single review catches most Node.js memory leaks before they reach production.

Next, pair this with deeper profiling across your stack. Read Profiling CPU and Memory Usage in Python, Node.js, and Java Apps for cross-runtime techniques, and explore Node.js Clustering: Multi-Core Performance for how per-worker memory behavior changes when you scale horizontally on a single box. If your leaks are tied to stream handling, Using Node.js Streams for Efficient File Processing covers the backpressure patterns that prevent buffer growth in the first place.

Leave a Comment