Node.js Memory, Garbage Collection & Production Failures

How Node.js memory really works: the V8 heap, garbage collection, memory leaks, OOM crashes, and diagnosing it all in production on EC2 and ECS.

By Naga Sai RaoJune 23, 202641 min read

Most Node.js apps run fine for months and then, one day, a service starts slowing down, its memory creeps up, and eventually it crashes with JavaScript heap out of memory. Nothing in the code "looks" wrong. This is the territory that separates someone who can write Node from someone who can run it in production: understanding how V8 stores your objects, how it cleans them up, what the memory limits actually are, and how the handful of classic failures show up and get diagnosed. This is the companion to the core Node.js guide, and it goes deep on exactly that. Plain-English first, then how it actually works under the hood, then the real-world failures and how to find them.

How to read this guide. Every idea is built from the ground up: what the thing is, how it works mechanically, then what goes wrong and how you would diagnose it on a real server. Analogies anchor the abstract parts. Where it matters for interviews, an "Interview answer" gives you phrasing you can use directly. The goal is not to memorize numbers but to understand the machine well enough that production surprises make sense. Current as of Node.js 24 LTS, 2026.

Where Your Data Lives: Stack and Heap

When your program runs, V8 (the engine inside Node) stores data in two very different places, and knowing which is which explains a lot of behavior.

The stack holds simple, fixed-size things: numbers, booleans, and the references (pointers) to bigger objects. It works like a stack of plates: every function call pushes a new frame on top, and when the function returns, its frame pops off and its local variables vanish automatically. It is tiny, extremely fast, and self-cleaning.

The heap holds everything with a variable or unknown size: objects, arrays, strings, functions, closures. These do not disappear when a function returns; they stay until the garbage collector decides nothing can reach them anymore. The heap is where almost all interesting memory behavior, and almost all memory problems, happen.

function example() {
  let count = 10;                  // the number lives on the STACK
  let user = { name: "Alice" };    // the object lives on the HEAP;
                                   // `user` (a reference to it) lives on the stack
}
// When example() returns: `count` and the `user` reference pop off the stack.
// The { name: "Alice" } object stays on the heap until GC proves nothing points to it.

Analogy. The stack is your desk: small, right in front of you, instantly cleared when you finish a task. The heap is the warehouse out back: it holds everything large, things stay there after you walk away, and someone (the garbage collector) periodically walks the aisles throwing out boxes that nobody has a delivery slip for anymore.

The practical payoff: pass-by-value vs pass-by-reference

This split is not academic. It is exactly why mutating an object inside a function changes the caller's object, but reassigning a number does not. Primitives (numbers, booleans, strings) are copied by value, because the value itself lives on the stack. Objects and arrays are passed by reference, because what lives on the stack is a pointer to the one shared object on the heap, and copying the pointer still points at the same box.

function tweak(num, obj) {
  num = 99;        // reassigns this function's OWN copy of the number
  obj.name = "Bob"; // follows the pointer and mutates the SHARED heap object
}

let n = 1;
let user = { name: "Alice" };
tweak(n, user);
console.log(n);         // 1   -> the primitive was copied, caller unaffected
console.log(user.name); // "Bob" -> the object is shared, caller sees the change

The same mechanism explains why two variables can secretly be the same object:

text

let a = { count: 0 };
let b = a;            // b copies the REFERENCE, not the object

   stack            heap
  ┌──────┐        ┌─────────────┐
  │  a ──┼───────>│ { count: 0 }│
  │  b ──┼───────>│             │   both point at ONE object
  └──────┘        └─────────────┘

b.count = 5;
console.log(a.count); // 5  -> a and b are the same heap object

Beginner trap: "I copied the array, so the original is safe." const copy = original copies the reference, not the data, so mutating copy mutates original. To actually duplicate, you need a shallow copy ([...original], { ...original }) or a deep copy (structuredClone(original)) for nested data. Confusing a reference copy with a value copy is behind a huge share of "why did my other variable change?" bugs.

Why the stack is small and the heap is large

The stack has a strict, small size limit (a few hundred KB to about 1 MB by default, tunable with --stack-size), because each function call must reserve a frame and the runtime needs that allocation to be instant and predictable. The heap is far larger and grows on demand. This is why a deeply recursive function overflows the stack (too many frames) while a giant array fills the heap (too many objects): same word "too much memory," two entirely different regions.

Beginner trap: "stack overflow" vs "out of memory" are different failures. A RangeError: Maximum call stack size exceeded means you pushed too many function frames onto the small stack, almost always from infinite or very deep recursion. A JavaScript heap out of memory means the heap filled up, usually from holding onto too many objects. They sound similar but have completely different causes and fixes: the first is a control-flow problem (fix the recursion, or convert it to a loop or an explicit queue), the second is a memory-retention problem (find what you are holding onto).

How Garbage Collection Actually Works

JavaScript does not make you free memory by hand the way C does. Instead V8 runs a garbage collector (GC): it periodically finds objects that can no longer be reached by your running program and reclaims their memory. The whole system rests on one observation about real programs, called the generational hypothesis: most objects die young. A request handler creates dozens of temporary objects that are garbage milliseconds later, while a few things (your config, your cache, your database pool) live for the entire process.

V8 leans into this by splitting the heap into two generations and collecting them differently.

New Space (the young generation)

New objects are born here. It is small, and it is collected very frequently with a fast algorithm called Scavenge. Scavenge divides New Space in half: objects are allocated into one half, and when it fills, the collector copies the survivors into the other half and wipes the first half wholesale. Because most young objects are already dead by collection time, there are usually few survivors to copy, so this is cheap and quick. An object that survives a couple of these rounds is considered "tenured" and gets promoted to Old Space.

Analogy. New Space is the kitchen counter during cooking. You churn through scraps constantly, and every few minutes you sweep the whole counter clean, keeping only the few things still in use. It stays fast precisely because almost everything on the counter is already trash by the time you sweep.

Old Space (the old generation)

Objects that survived long enough live here. This region is larger and collected far less often, using Mark-Sweep-Compact: the collector marks every object still reachable from your program, sweeps away everything unmarked, and occasionally compacts the survivors together to avoid fragmentation. This is more expensive than Scavenge, which is why V8 tries hard to keep short-lived objects from ever reaching Old Space.

Analogy. Old Space is the warehouse. You do not inventory it every few minutes; that would be far too slow. You do a big, thorough audit occasionally: walk every aisle, tag what is still claimed, haul out everything untagged, and slide the remaining boxes together so there are no awkward gaps.

What "reachable" means

The collector starts from a set of roots (global objects, the current call stack, and similar) and follows every reference outward. Anything it can reach is alive; anything it cannot reach is garbage. This is the crucial mental model for leaks: an object is kept alive as long as something still references it, even if your program will never actually use it again. A leak in Node is almost never "GC failed to run"; it is "you are still unintentionally referencing things you are done with."

Interview answer: "How does garbage collection work in Node.js?" V8 uses a generational, mark-and-sweep collector based on the idea that most objects die young. New objects go into a small New Space collected frequently with a fast copying algorithm called Scavenge, which keeps only the survivors. Objects that live long enough are promoted to a larger Old Space, collected less often with Mark-Sweep-Compact, which marks everything reachable from the roots, sweeps away the rest, and compacts to reduce fragmentation. An object is reclaimed only when nothing references it anymore, so memory leaks happen when code unintentionally keeps references to objects it no longer needs.

Why GC matters for performance: stop-the-world pauses

Here is the part that turns into a production issue. Some GC work, especially major collections in Old Space, requires pausing your JavaScript while it runs, because the collector cannot safely move objects around while your code is also touching them. These are "stop-the-world" pauses. Modern V8 (its collector is called Orinoco) does a lot of the work concurrently and incrementally on background threads to keep pauses small, but they are never zero. On a busy server, a long major GC pause shows up as a latency spike: most requests are fast, but the unlucky ones that land during a pause are slow, which is why GC trouble usually appears in your p99 latency, not your average. (p99, the 99th percentile, is the response time that 99 percent of requests come in under; it captures the slow unlucky 1 percent that an average hides.)

Beginner trap. "Garbage collection is automatic, so I never have to think about memory." Automatic collection frees you from manually freeing memory, but it does not free you from managing references. If you hold references too long you leak; if you churn huge numbers of objects you make GC work harder and cause pauses. Automatic does not mean free.

The Memory Limit: max-old-space-size and the Heap Ceiling

V8 does not let the heap grow without bound. Old Space in particular has a ceiling, and when a major collection cannot free enough room to stay under it, the process dies with the famous FATAL ERROR: Reached heap limit / JavaScript heap out of memory.

The default ceiling depends on the Node version and the machine, and the exact number is not something to memorize, because you can always ask V8 directly:

const v8 = require("node:v8");
const limitGB = v8.getHeapStatistics().heap_size_limit / 1024 ** 3;
console.log(`Old space heap limit: ${limitGB.toFixed(2)} GB`);

You raise the ceiling with the --max-old-space-size flag, in megabytes:

bash

# Allow up to ~4 GB of old-space heap
node --max-old-space-size=4096 server.js

# Commonly set via env var so it applies to npm scripts and tooling
NODE_OPTIONS="--max-old-space-size=4096" npm run build

The single most important point in this guide. Raising --max-old-space-size does not fix a memory leak. If your code keeps accumulating references, a bigger heap only means the process takes longer to fill up before it crashes with the exact same error. Increasing the limit is the right move only when your workload genuinely needs more memory at once (large data processing, big builds). When memory climbs steadily under steady load, that is a leak, and the bigger heap just delays the inevitable while making each GC pause longer. Diagnose first; resize second, and only if the data says so.

The container trap

This one bites almost everyone who deploys to Kubernetes or Docker. Node has been container-aware since version 12, meaning it reads the container's memory limit (its cgroup limit) and sizes the heap accordingly: roughly 50% of the container's memory up to about 4 GiB, leveling off near a 2 GB heap beyond that, when you do not set the flag yourself. The trap appears when these two limits disagree.

If your container is capped at 512 MB but you launch Node with --max-old-space-size=2048, you have told V8 it may use 2 GB of heap inside a box that the orchestrator will kill at 512 MB. V8 happily grows the heap, the container blows past its cgroup limit, and the kernel's OOM killer terminates the process before V8's own limit is ever reached. The confusing symptom: your app dies with a generic OOMKilled (exit code 137) and no nice V8 heap-limit error, because Node never got the chance to report one.

Analogy. The cgroup limit is the weight rating of an elevator; --max-old-space-size is how much you personally decide to load onto the cart you push into it. If you load the cart heavier than the elevator's rating, it does not matter that your cart could hold more; the elevator's safety system stops everything. Always keep your heap setting comfortably under the container's memory limit, leaving headroom for the stack, buffers, and non-heap memory.

Interview answer: "Why does my Node container get OOMKilled even though the app seems fine?" Almost always because the V8 heap limit and the container memory limit are out of sync. Node sizes its heap from the cgroup limit by default, but if --max-old-space-size (or NODE_OPTIONS) sets a heap larger than the container allows, V8 will grow past the container's cap and the kernel's OOM killer ends the process with exit code 137, before V8 reports its own heap error. The fix is to set the heap limit below the container limit with headroom for non-heap memory, or to leave it unset and let Node's container awareness size it.

Reading process.memoryUsage()

Before you can diagnose anything, you need to read Node's own memory report. process.memoryUsage() returns an object whose fields each mean something specific, and confusing them sends people down wrong paths.

console.log(process.memoryUsage());
// {
//   rss: 215_482_368,        // total memory the OS gave the process
//   heapTotal: 138_412_032,  // heap V8 has reserved
//   heapUsed: 119_530_104,   // heap actually in use by your objects
//   external: 8_220_310,     // memory for C++ objects bound to JS (e.g. Buffers)
//   arrayBuffers: 1_540_096  // subset of external: ArrayBuffer/Buffer memory
// }

What each one tells you:

rss (Resident Set Size) is the total physical memory the operating system has handed your process, including the heap, the stack, and Node's own C++ machinery. This is the number your container limit is actually measured against, so it is what gets you OOMKilled.
heapTotal is how much heap V8 has reserved from the OS so far. It grows as needed.
heapUsed is how much of that heap your live JavaScript objects actually occupy. This is the number to watch over time for leaks: if it climbs steadily and never comes back down under steady load, you are leaking.
external is memory used by C++ objects tied to your JavaScript, most commonly Buffers and other binary data. A leak here will not show in heapUsed but will still grow rss.
arrayBuffers is the slice of external specifically for ArrayBuffer and Buffer allocations.

Beginner trap: watching rss to find a JavaScript leak. rss is noisy: it includes non-heap memory, it rarely shrinks even after objects are freed (the OS often lets a process keep memory it might reuse), and it is affected by buffers and native code. For a JavaScript object leak, watch heapUsed across time under stable load. Use rss to understand total footprint and container pressure, not to pinpoint a leak.

Node Memory on a Real Server: EC2, the OS, and Cluster Workers

The container trap is really one instance of a bigger truth: your Node process uses more memory than its V8 heap, and the host kills you based on the bigger number, not the heap. Seeing this clearly on a raw virtual machine like an EC2 instance makes the whole picture click.

Picture four nested boxes, each sitting inside the next:

text

┌─────────────────────────────────────────────────────────┐
│ EC2 instance RAM (e.g. t3.medium = 4 GB total)            │
│  ┌──────────────────────────────────────────────────┐    │
│  │ OS, kernel, system agents, filesystem page cache  │    │
│  └──────────────────────────────────────────────────┘    │
│  ┌──────────────────────────────────────────────────┐    │
│  │ Your Node PROCESS  (this whole box = RSS)          │   │
│  │  ┌────────────────────────────────────────────┐   │   │
│  │  │ V8 Heap: your JS objects (New + Old Space)  │   │   │
│  │  │ capped by --max-old-space-size              │   │   │
│  │  └────────────────────────────────────────────┘   │   │
│  │  + Buffers / ArrayBuffers (external, NOT in heap)  │   │
│  │  + stack + compiled code + native addons + libuv  │   │
│  └──────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────┘

The V8 heap is just one box inside the process. The number the operating system actually sees, and the number that gets you killed, is RSS (the whole process box). And the instance RAM is shared: the OS, the kernel, your logging and monitoring agents, and the filesystem page cache all take a slice, so not all of a 4 GB box is yours. Realistically maybe 3 to 3.5 GB is usable by your app.

Why heapUsed can look fine while the box dies

The fields from the previous section map directly onto that diagram. heapUsed is only the innermost box. external and arrayBuffers (Buffers, file and network data, streamed uploads) live outside the V8 heap in C++ memory, so they are not governed by --max-old-space-size, yet they still count toward rss and therefore toward instance RAM. This is how a service with a perfectly flat heapUsed still exhausts its instance: a Buffer-heavy workload (image processing, large uploads, streaming) piled up hundreds of megabytes of external memory that the heap limit never watched.

Analogy. The V8 heap is the trunk of your car, and --max-old-space-size is a rule about how full the trunk may get. But the car's total weight (RSS) also includes passengers, fuel, and roof cargo (Buffers, native memory, code). The bridge's weight limit (instance RAM) is checked against the whole car, not just the trunk. You can obey the trunk rule perfectly and still be too heavy for the bridge.

How a raw EC2 instance differs from a container

On a raw instance with Node running directly (via systemd, pm2, or just node server.js), Node's container-awareness reads the whole instance's RAM and sizes its default heap from that (roughly 50% up to the cap, as covered above). The difference from a container is what happens at the limit. A container has a cgroup cap scoped to it, and exceeding it gets that container OOM-killed with exit code 137. On a raw instance there is no per-process cap; instead, when the whole instance runs low on RAM, the Linux kernel's OOM killer wakes up and terminates whatever process it judges worst, often your Node process, sometimes something else entirely, possibly after the box has already started swapping and slowing down.

The cluster-worker multiplier

This is the EC2 mistake that surprises people most. To use all the cores on an instance, teams run multiple Node processes with cluster or pm2 in cluster mode. But each worker is a full process with its own heap, so running workers multiplies memory usage. Four workers on a 4-core, 4 GB instance, each defaulting to a roughly 2 GB old-space heap, has theoretically authorized about 8 GB of heap on a 4 GB box. They will not all fill at once, but under load they can collectively blow past the instance RAM and trigger the kernel OOM killer. You must divide the memory budget across workers rather than give each the full-instance default.

bash

# 4 GB instance, 4 cluster workers: budget ~700 MB heap EACH, not the 2 GB default
NODE_OPTIONS="--max-old-space-size=700" pm2 start server.js -i 4

A concrete walkthrough

Take a t3.small (2 GB RAM) running one Node API:

The instance boots; the OS and agents consume about 400 MB, leaving roughly 1.6 GB usable.
Node starts. Container-awareness sees 2 GB and defaults the old-space heap near 1 GB.
The app's live objects settle at heapUsed around 300 MB, and rss sits around 450 MB (heap plus code plus stacks). Healthy.
Traffic spikes with large file uploads, each buffering a 20 MB file. Thirty concurrent uploads is about 600 MB of external Buffer memory. heapUsed barely moves because that is not heap, but rss jumps toward 1.1 GB.
Add the 400 MB of OS overhead and the box nears its 1.6 GB usable ceiling. The kernel OOM killer fires and kills Node, with no V8 heap out of memory error in the logs, because the heap was never the problem. The process RSS outgrew the instance RAM.

Observing it on the box

bash

# Per-process resident memory (rss is in KB)
ps -o pid,rss,comm -p $(pgrep -f "node server.js")

# Whole-instance memory picture
free -m            # total / used / free / available
top                # watch the RES column for your node process

# Was a process OOM-killed by the kernel? Check the kernel log:
dmesg | grep -i "killed process"

And from inside Node, log the breakdown so you can see which region is growing:

setInterval(() => {
  const m = process.memoryUsage();
  const mb = (n) => (n / 1024 / 1024).toFixed(0);
  console.log(
    `rss=${mb(m.rss)}MB heapUsed=${mb(m.heapUsed)}MB ` +
    `external=${mb(m.external)}MB arrayBuffers=${mb(m.arrayBuffers)}MB`
  );
}, 10000);

If rss climbs while heapUsed stays flat, look at external and arrayBuffers (Buffers, streams). If heapUsed climbs too, it is a heap leak (the four patterns in the next section). If everything inside Node is flat but the box still runs out, something else on the instance is eating the RAM.

The practical rules for sizing Node on a VM

Size the heap below usable instance RAM, not total RAM. Leave headroom for the OS, agents, buffers, and native memory. A rough single-process starting point is --max-old-space-size around 60 to 75 percent of (instance RAM minus OS overhead).
Divide the budget across cluster workers. With N workers, each gets roughly one Nth of the app's memory budget, not the full-instance default.
Watch RSS against instance RAM, because that comparison is what decides whether the kernel OOM killer fires, not the heap number.
Give buffer-heavy work extra headroom. Streaming, uploads, and image processing grow external memory invisibly to the heap metrics.
Right-size the instance or scale out. If RSS legitimately needs more than the box offers, a bigger instance or more instances behind a load balancer is the answer, not just raising the heap flag, which never fixes a real leak anyway.

Interview answer: "How does a Node app use the memory on an EC2 instance, and why might it get killed?" A Node process's total memory (its RSS) is the V8 heap plus external memory like Buffers, plus stack, compiled code, native addons, and thread-pool stacks. The instance's RAM is shared with the OS and agents, so only part is usable. The kernel kills based on RSS against available instance RAM, not against the V8 heap limit, so an app can have a flat heapUsed and still be OOM-killed because Buffer-heavy work grew external memory, or because several cluster workers each took a full default heap and collectively exceeded the box. The fixes are to size the heap below usable RAM, divide that budget across workers, watch RSS rather than just the heap, and scale the instance when the workload genuinely needs it.

The Classic Memory Leaks

A memory leak in Node is not the collector malfunctioning. It is your code holding references to things it is finished with, so those things are still "reachable" and can never be collected. Four patterns cause the overwhelming majority of real leaks.

1. Module-level collections that only grow

A Map, array, or object declared at module scope lives for the entire process. If you keep adding to it and never remove, it grows forever.

// ❌ Leak: every request adds an entry that is never removed.
const cache = new Map();
app.get("/user/:id", async (req, res) => {
  const user = await db.getUser(req.params.id);
  cache.set(req.params.id, user); // grows without bound, forever
  res.json(user);
});

// ✅ Bound it: cap the size, or use a real cache with eviction.
//    LRU (Least Recently Used) drops the entry untouched for the longest;
//    TTL (Time To Live) drops entries after a fixed age.
const cache = new Map();
function remember(key, value) {
  if (cache.size > 10_000) {
    cache.delete(cache.keys().next().value); // evict the oldest
  }
  cache.set(key, value);
}

The naive-cache trap. A plain Map used as a cache with no eviction policy is probably the most common Node leak in the wild. A cache must have a bound: a maximum size, a time-to-live, or both. "Cache forever" is just "leak slowly."

2. Listeners and subscriptions you never remove

Every .on() adds a listener that holds a reference to its callback (and everything that callback closes over). Add them per request without removing them and they pile up.

// ❌ Leak: a new listener every request, never removed.
app.get("/stream", (req, res) => {
  emitter.on("data", (chunk) => res.write(chunk)); // accumulates forever
});

// ✅ Remove it when the request ends.
app.get("/stream", (req, res) => {
  const onData = (chunk) => res.write(chunk);
  emitter.on("data", onData);
  res.on("close", () => emitter.off("data", onData)); // clean up
});

Node's MaxListenersExceededWarning (it fires at 11 listeners on one emitter) is usually a real leak warning, not noise to silence by raising the limit.

3. Timers that are never cleared

setInterval keeps its callback, and everything that callback references, alive for as long as the interval runs. Start intervals tied to a connection or object without clearing them and you leak.

// ❌ Leak: the interval (and everything `bigData` references) lives forever.
function startPolling(bigData) {
  setInterval(() => check(bigData), 1000); // never cleared
}

// ✅ Keep the handle and clear it when done.
function startPolling(bigData) {
  const id = setInterval(() => check(bigData), 1000);
  return () => clearInterval(id); // caller stops it when finished
}

4. Closures that capture more than you think

A closure keeps alive every variable it references from its enclosing scope. A long-lived closure that captures a large object pins that object in memory even if it only uses one small field of it.

// ❌ The handler closes over the entire `hugePayload` just to read one field.
function register(hugePayload) {
  emitter.on("tick", () => log(hugePayload.id)); // pins all of hugePayload
}

// ✅ Capture only what you need.
function register(hugePayload) {
  const id = hugePayload.id;            // extract the small piece
  emitter.on("tick", () => log(id));    // huge payload can now be collected
}

Interview answer: "What are the common causes of memory leaks in Node.js?" The big four are unbounded module-level collections (a Map or array used as a cache with no eviction), event listeners and subscriptions added repeatedly and never removed, timers (setInterval) that are never cleared, and long-lived closures that capture large objects. They share one root cause: code keeps a reference to data it is finished with, so the garbage collector cannot reclaim it because the data is still reachable. The fix is always to drop the reference: bound the cache, remove the listener, clear the timer, or capture only the small piece you need.

The "let it be collected" tools: WeakMap, WeakRef, AbortController

These three exist specifically to avoid the leaks above by not holding things alive longer than needed.

A WeakMap (and WeakSet) holds its keys weakly: if the only thing referencing a key object is the WeakMap, the garbage collector is still free to reclaim it, and the entry disappears automatically. This makes a WeakMap perfect for attaching metadata to objects you do not own the lifecycle of, because you never have to remember to delete entries.

// Cache derived data keyed by an object, without pinning that object alive.
const parsedCache = new WeakMap();

function getParsed(reqObject) {
  if (parsedCache.has(reqObject)) return parsedCache.get(reqObject);
  const parsed = expensiveParse(reqObject);
  parsedCache.set(reqObject, parsed);
  return parsed; // when reqObject is GC'd, its cache entry vanishes on its own
}

A WeakRef lets you reference an object without keeping it alive, for advanced caching where you want "use it if it still exists, otherwise rebuild." It is a sharp tool used rarely; the honest interview answer is that you reach for WeakMap often and WeakRef almost never.

An AbortController is the modern way to cancel in-flight async work (a fetch, a stream, a timer) so it does not linger and leak. You pass its signal into the operation and call abort() to stop it, which is the clean fix for "the user navigated away but the request and its callbacks are still pending."

const controller = new AbortController();

// The fetch is cancellable; aborting frees it and rejects the promise.
fetch("https://api.example.com/slow", { signal: controller.signal })
  .then(res => res.json())
  .catch(err => {
    if (err.name === "AbortError") return; // expected on cancel
    throw err;
  });

// Cancel it (e.g. on request close, timeout, or component teardown):
controller.abort();

Beginner trap: a plain Map cache keyed by objects leaks; a WeakMap does not. If you key a regular Map by request or user objects and never delete entries, those objects can never be collected, because the Map holds them strongly forever. Switching to a WeakMap lets them go the moment nothing else needs them. Use a WeakMap whenever the key's lifetime should decide the entry's lifetime.

Finding a Leak: Heap Snapshots

When heapUsed climbs steadily and you need to know what is accumulating, you take heap snapshots: full pictures of every object on the heap, which you compare over time to see what is growing.

The workflow most people use:

bash

# Start the app with the inspector open
node --inspect server.js
# Then open chrome://inspect in Chrome, click "inspect", go to the Memory tab,
# and take heap snapshots. The key technique is the COMPARISON:
#   1. Take snapshot A after warm-up.
#   2. Exercise the suspected path many times (e.g. hammer an endpoint).
#   3. Take snapshot B.
#   4. Compare B to A and sort by "Delta": what grew is your leak suspect.

You can also capture snapshots programmatically, which is handy on a server you cannot attach a debugger to, and Node can even dump one automatically right before it would crash from the heap limit:

const v8 = require("node:v8");
// Writes a .heapsnapshot file you can load into Chrome DevTools later.
v8.writeHeapSnapshot();

bash

# Dump a snapshot automatically when the process approaches the heap limit,
# so you can inspect what filled it up right before the OOM crash.
node --heapsnapshot-near-heap-limit=2 server.js

Analogy. A single heap snapshot is a photograph of a messy room; you cannot tell what is accumulating from one photo. Two snapshots taken before and after some activity are a "spot the difference" pair: whatever is bigger in the second photo is what your code is piling up. The comparison is the whole technique; a lone snapshot rarely tells you much.

Other tools worth naming. --trace-gc prints a line for every garbage collection so you can see how often and how long GC runs (helpful for spotting GC thrash). --prof produces a V8 profile for CPU hotspots. The clinic suite (clinic doctor, clinic heapprofiler, clinic flame) automates much of this and produces readable reports, and is a common answer to "what tools do you use to diagnose Node performance?"

A leak hunt, start to finish

Knowing the tools exist is different from knowing the loop. Here is the whole investigation as you would actually run it, so the pieces connect:

Notice. Your dashboard (or the in-process logger from earlier) shows heapUsed climbing slowly across hours and never dropping back under steady traffic, then the service restarts itself every so often with heap out of memory. That steady upward slope, not a spike, is the signature of a leak rather than a load burst.
Confirm it is the heap. Watch heapUsed specifically. If heapUsed is flat but rss climbs, it is external/Buffer memory, not a classic object leak, and you would look at streams and Buffers instead. Here, heapUsed itself climbs, so it is a retained-object leak.
Capture a baseline. Let the app warm up and reach steady state, then take heap snapshot A (in Chrome DevTools via --inspect, or with v8.writeHeapSnapshot()).
Reproduce the growth. Drive the suspected path hard: replay a few thousand requests to the endpoint you suspect, so whatever is accumulating accumulates a lot. The leak needs to be big in the next snapshot to stand out.
Capture and compare. Take heap snapshot B, then load it in DevTools and switch the view to Comparison against A, sorted by the size delta. The object type that grew by thousands of instances is your suspect: maybe User objects, or Array, or closures from a specific function.
Find what is holding it. Select an instance of the leaking object and read its Retainers (the "retaining path"), which traces the chain of references keeping it alive all the way back to a GC root. That path points straight at the culprit: a module-level Map it was added to, an emitter still listening, an interval still running.
Fix the reference and verify. Drop the reference (bound the cache, remove the listener, clear the timer), redeploy, and watch heapUsed flatten across the same load. A flat line under sustained traffic is the proof the leak is gone.

The mental shortcut. A heap snapshot answers "what is piling up," and the retaining path answers "who is holding it." Almost every leak hunt is those two questions in sequence, and the answer to the second is always one of the four classic causes.

Event Loop Lag: The Other Way Node "Hangs"

Not every production problem is memory. The other big one is blocking the event loop. Because your JavaScript runs on one thread, any synchronous work that takes a long time stops everything: no other requests are served, no timers fire, no I/O callbacks run. The app is not crashed; it is frozen, and health checks may start failing as if it were down.

The usual culprits are CPU-bound work with no chance to yield: a huge synchronous loop, JSON.parse or JSON.stringify on a very large payload, synchronous crypto like crypto.pbkdf2Sync, reading a large file with readFileSync in a handler, or a catastrophically backtracking regular expression on hostile input.

You measure it by checking how late timers fire compared to when they were scheduled; that lateness is your event-loop lag:

const { monitorEventLoopDelay } = require("node:perf_hooks");

const h = monitorEventLoopDelay({ resolution: 20 });
h.enable();

setInterval(() => {
  // mean and max delay in milliseconds; rising numbers mean the loop is blocking
  console.log(`loop delay mean=${(h.mean / 1e6).toFixed(1)}ms max=${(h.max / 1e6).toFixed(1)}ms`);
  h.reset();
}, 5000);

The fix is architectural, not a flag. When the loop is blocked by CPU work, the answer is to get that work off the main thread: move it to a worker_thread, break it into chunks that yield with setImmediate between them, push it to a separate service, or replace the algorithm (for example, stream and parse large JSON instead of JSON.parse-ing it whole). You cannot "tune" your way out of blocking; you have to stop blocking.

Interview answer: "How do you detect and fix a blocked event loop?" Detect it by measuring event-loop delay, either with perf_hooks' monitorEventLoopDelay or an APM tool (Application Performance Monitoring: a service like Datadog or New Relic that automatically tracks an app's runtime health and timing); steadily rising delay under load means synchronous work is hogging the thread. Find the culprit with a CPU profile (--prof or clinic flame), which highlights the long synchronous function. Fix it by moving CPU-bound work off the main thread with worker_threads, chunking long loops so they yield to the loop, replacing blocking calls with async ones, or offloading to another service. The single thread must stay free to keep the app responsive.

Other Production Failures Worth Knowing

A few more failures round out what interviewers (and real on-call shifts) throw at you.

Unhandled promise rejections. A rejected promise with no .catch triggers unhandledRejection. In modern Node the default is to treat this as fatal and crash the process, which is correct: an unhandled rejection means an error path you never accounted for. The fix is to handle rejections at their source, not to silence the warning globally.

Uncaught exceptions. A thrown error that no try/catch caught fires uncaughtException and, by default, crashes. The right posture is to log, perform fast cleanup, and exit, letting your process manager restart a clean instance. Treating uncaughtException as a "swallow everything and continue" handler is dangerous because the process may be in a corrupted state.

process.on("uncaughtException", (err) => {
  logger.fatal(err);     // record what happened
  // optionally flush logs / close critical resources quickly
  process.exit(1);       // then exit; let the supervisor restart a fresh process
});

File descriptor and connection exhaustion. Every open socket, file handle, and database connection consumes a finite operating-system resource. Leak them (open connections in a loop without closing, never returning pooled connections, unbounded outbound requests) and you eventually hit EMFILE: too many open files or exhaust your database pool, at which point new work stalls or errors even though CPU and memory look fine. The fixes are bounded connection pools, always releasing resources in a finally, and limiting outbound concurrency.

Beginner trap: assuming a crash-restart loop is "handled." Letting a process manager restart after a fatal error is correct, but if the underlying cause is a leak or a poison input, the fresh process hits the same wall and you get a crash loop: the service flaps up and down, dropping requests each cycle. Restart is a safety net for the unexpected, not a substitute for fixing a reproducible failure. Watch your restart count as a signal.

Monitoring in Production: CloudWatch and Beyond

Everything so far has been local diagnosis: attach a debugger, read process.memoryUsage(), take a snapshot. In production you cannot babysit one process; you need metrics flowing off every instance so you can see trouble building and get paged before the crash. On AWS that means CloudWatch, and there is one gotcha that catches almost everyone first.

The "no memory metric on a bare EC2 instance" gotcha

For a plain EC2 instance, CloudWatch shows metrics like CPU utilization, network, and disk I/O by default, but not memory usage or disk space. This is not an oversight: those built-in metrics come from the hypervisor, the software layer underneath your instance that carves one physical server into many virtual machines. The hypervisor can see hardware-level activity going into your instance (CPU cycles, network bytes) but cannot see inside the operating system to know how much RAM is actually used versus held as disk cache. Memory and disk usage are OS-level facts, so the hypervisor simply does not have them. (This is the bare EC2 story specifically. On ECS, memory utilization is provided for you, covered in the next subsection.)

The fix is the unified CloudWatch agent, a process you install on the instance that reads the OS memory subsystem (on Linux, /proc/meminfo) and pushes those numbers to CloudWatch as custom metrics under the CWAgent namespace. The setup is three steps: give the instance an IAM role with the CloudWatchAgentServerPolicy, install the agent, and point it at a config file listing the metrics you want.

json

// A minimal CloudWatch agent config: report memory and root-disk usage every 60s
{
  "metrics": {
    "append_dimensions": { "InstanceId": "${aws:InstanceId}" },
    "metrics_collected": {
      "mem": { "measurement": ["mem_used_percent"], "metrics_collection_interval": 60 },
      "disk": { "measurement": ["used_percent"], "resources": ["/"], "metrics_collection_interval": 60 }
    }
  }
}

After the agent starts, mem_used_percent appears in CloudWatch under the CWAgent namespace, keyed by your InstanceId. The crucial point connecting this to everything above: mem_used_percent is instance-wide RSS pressure, the same total-process-against-instance-RAM number that decides whether the kernel OOM killer fires. It is the right metric to alarm on for "the box is about to run out," but it will not tell you whether the cause is the V8 heap, Buffers, or another process. For that you need app-level metrics.

On ECS, memory utilization is built in (the EC2 gotcha does not apply)

The "you must install an agent" rule is specific to a bare EC2 instance. If your Node app runs on Amazon ECS (the container service), CloudWatch gives you CPU and memory utilization automatically, with no CloudWatch agent. The reason is that ECS already runs the ECS container agent inside the box, and once a minute it measures the CPU and memory each running task is using and reports it to CloudWatch in the AWS/ECS namespace. There is already an in-OS agent doing the measuring, so the visibility problem is solved for you.

One subtlety worth knowing: ECS reports memory utilization as a percentage of the limit you declared in the task definition, not as a percentage of the physical machine. That is actually the number you want for catching an OOM kill, because an ECS task is killed when it hits its task-definition memory limit, so a MemoryUtilization climbing toward 100% is a climb toward that kill. As with any memory alarm, watch the Maximum statistic rather than the average, since a task that averages 60% but peaks at 95% is one burst away from being killed.

The behavior differs slightly by launch type (where the container actually runs):

Where your Node app runs	Memory utilization by default?	Why, and what to add
Bare EC2 instance	No	No in-OS agent; the hypervisor cannot see OS memory. Install the CloudWatch agent (`mem_used_percent`, `CWAgent` namespace).
ECS on Fargate (serverless containers)	Yes, automatic	The ECS agent reports CPU and memory against the task-definition limit (`AWS/ECS` namespace). Nothing to install.
ECS on EC2 (containers on your own instances)	Yes, at service/task level	The ECS container agent provides task memory. But the underlying EC2 instance's own memory still needs the CloudWatch agent if you want it.

For deeper, per-container detail rather than service averages, you enable Container Insights, a paid CloudWatch feature that publishes task- and container-level metrics (like MemoryUtilized) to the ECS/ContainerInsights namespace, with ready-made dashboards. It is the ECS equivalent of "I need to see inside each task, not just the service average."

Interview-ready summary. "AWS gives you memory metrics" is true for ECS and false for bare EC2, and that catches people. On EC2 you install the CloudWatch agent because the hypervisor cannot see OS memory; on ECS (Fargate or EC2 launch type) the ECS container agent already reports memory against the task-definition limit, so it is there by default. Either way, you still publish app-level metrics (heap, event-loop lag) from inside Node for cause-level insight, because the platform metric only tells you the box or task is full, not why.

Pushing Node's own numbers as custom metrics

To see inside the process (heap usage, event-loop lag), publish your own metrics from the app with the CloudWatch PutMetricData API. This turns the process.memoryUsage() fields and the event-loop delay from local curiosities into dashboardable, alarmable time series.

const { CloudWatchClient, PutMetricDataCommand } = require("@aws-sdk/client-cloudwatch");
const { monitorEventLoopDelay } = require("node:perf_hooks");

const cw = new CloudWatchClient({});
const loop = monitorEventLoopDelay();
loop.enable();

setInterval(async () => {
  const m = process.memoryUsage();
  const mb = (n) => n / 1024 / 1024;
  await cw.send(new PutMetricDataCommand({
    Namespace: "MyApp/Node",
    // Batch related metrics into ONE call to cut cost and avoid throttling.
    MetricData: [
      { MetricName: "HeapUsedMB", Value: mb(m.heapUsed), Unit: "Megabytes" },
      { MetricName: "RssMB", Value: mb(m.rss), Unit: "Megabytes" },
      { MetricName: "ExternalMB", Value: mb(m.external), Unit: "Megabytes" },
      { MetricName: "EventLoopLagMs", Value: loop.mean / 1e6, Unit: "Milliseconds" },
    ],
  }));
  loop.reset();
}, 60_000);

Cost note. CloudWatch bills custom metrics by the number of distinct metrics and by PutMetricData calls, so batch related metrics into a single call (as above) and pick a sensible interval like 60 seconds rather than every second. An alternative that avoids per-call cost is the embedded metric format (EMF): you write specially structured JSON to your logs and CloudWatch extracts metrics from it.

Alarms and the OOM signature in logs

Metrics are only useful if something watches them. Set CloudWatch alarms so a human gets paged while there is still time to act, not after the crash:

mem_used_percent high for several minutes (say above 85 percent) catches the instance approaching its RAM ceiling before the OOM killer fires.
HeapUsedMB trending up across hours is your leak alarm; pair it with a sustained-slope condition rather than a single spike.
EventLoopLagMs above a threshold catches the event loop blocking before health checks start failing.

When a process does get OOM-killed, the evidence lives in logs, not metrics. On EC2 the kernel records it (dmesg | grep -i "killed process"); on ECS the task-stopped reason shows OutOfMemoryError and the container exits with code 137. Ship the instance logs to CloudWatch Logs so this is searchable after the fact, and so a metric filter can turn "Killed process" lines into an alarmable metric.

Where APM tools fit. APM stands for Application Performance Monitoring: a category of tools that attach to your running app and continuously record how it behaves, like how long each request takes, how often errors happen, how busy the event loop is, and how memory trends over time, then show it all on dashboards with alerting. Think of CloudWatch as monitoring the machine (CPU, RAM, disk) and an APM as monitoring the application running on it (routes, queries, code-level timing). Raw CloudWatch gives you metrics, logs, and alarms, but reading a leak's retaining path or a flame graph (a chart showing which functions consumed the most CPU time) is painful in it. APM tools (Datadog, New Relic, and Grafana are popular ones; OpenTelemetry is an open, vendor-neutral standard for the same job) add automatic Node instrumentation: per-route latency, event-loop lag, garbage-collection pauses, heap trends, and distributed traces (following a single request as it hops across multiple services), usually with far nicer leak and CPU views. The common production setup is CloudWatch for infrastructure and alarms plus an APM for deep application insight. For interviews, knowing why you need the agent (the hypervisor cannot see OS memory) and which metric maps to which failure matters more than any specific vendor.

Interview answer: "How do you monitor a Node service's memory in production on AWS?" It depends on where it runs. On a bare EC2 instance, the default CloudWatch metrics include CPU and network but not memory, because those come from the hypervisor, which cannot see inside the OS, so you install the unified CloudWatch agent to push OS-level memory (mem_used_percent, CWAgent namespace) and alarm on it. On ECS (Fargate or the EC2 launch type) memory utilization is provided automatically in the AWS/ECS namespace, measured against the task-definition limit, because the ECS container agent already reports it, and Container Insights adds per-task detail. In all cases you also publish custom metrics from the app, like heapUsed, rss, and event-loop lag, via PutMetricData or the embedded metric format, and alarm on a sustained heap climb (a leak) or rising loop lag (blocking). For OOM events you rely on logs, since a killed process leaves a dmesg/exit-137 signature rather than a clean metric. Many teams add an APM tool on top for richer heap and trace views.

Putting It Together: A Diagnostic Playbook

When a Node service misbehaves in production, the symptom usually points to the category:

Memory grows steadily and never drops, then crashes with heap out of memory. That is a leak. Confirm by watching heapUsed over time, then compare two heap snapshots to find what is accumulating, and look first at caches, listeners, timers, and closures.
The process dies with OOMKilled / exit code 137 and no V8 error. That is the container trap: the heap limit and the container limit are out of sync, or total rss (heap plus buffers plus native) exceeds the cgroup cap. Align --max-old-space-size below the container limit with headroom.
Latency is fine on average but terrible at p99, with periodic spikes. Suspect GC pauses (check --trace-gc) or intermittent event-loop blocking (measure with monitorEventLoopDelay).
The whole service freezes and health checks fail, but it has not crashed. The event loop is blocked by synchronous CPU work. Profile to find it, then move it off the main thread.
Requests start failing with EMFILE or pool-timeout errors while CPU and memory look healthy. Resource exhaustion: leaked file descriptors or connections. Bound the pools and release in finally.

Interview answer: "How would you debug a Node service that is slowly using more memory until it crashes?" First confirm it is actually a heap leak by watching process.memoryUsage().heapUsed over time under steady load; a steady climb that never recedes confirms it. Then take a heap snapshot after warm-up, exercise the app, take a second snapshot, and compare them sorted by growth to identify which objects are accumulating. The cause is almost always a retained reference: an unbounded cache, listeners or timers never cleaned up, or a closure pinning a large object. Fix the reference rather than raising --max-old-space-size, because a bigger heap only delays the same crash. If it is a container, also verify the heap limit sits safely under the container memory limit to avoid an OOM kill.

aws ecs containers node.js ec2

Naga Sai Rao

Some things fade fast. Some last. Learn the ones that last.

About the author →

Web Development

NodeJS Fundamentals

Master Node.js for interviews: the event loop, async patterns, streams, concurrency, and the beginner traps that quietly sink candidates. With worked examples.

June 23, 202639 min read

node.js backend

Web Development

React Rendering Strategies, Explained: CSR, SSR, SSG, ISR, and PPR

CSR, SSR, SSG, ISR, and PPR explained with realistic examples, plus exactly which Core Web Vitals each rendering strategy moves. From build time to the browser.

June 23, 202631 min read

react performance next.js

Frontend

React Fundamentals

A practical, example-driven React guide from fundamentals to React 19. Master hooks, the Virtual DOM, performance, and the gotchas interviewers actually test.

June 23, 202641 min read

react

Web Development

Node.js Memory, Garbage Collection & Production Failures

How Node.js memory really works: the V8 heap, garbage collection, memory leaks, OOM crashes, and diagnosing it all in production on EC2 and ECS.

By Naga Sai RaoJune 23, 202641 min read

How to read this guide. Every idea is built from the ground up: what the thing is, how it works mechanically, then what goes wrong and how you would diagnose it on a real server. Analogies anchor the abstract parts. Where it matters for interviews, an "Interview answer" gives you phrasing you can use directly. The goal is not to memorize numbers but to understand the machine well enough that production surprises make sense. Current as of Node.js 24 LTS, 2026.

Where Your Data Lives: Stack and Heap

When your program runs, V8 (the engine inside Node) stores data in two very different places, and knowing which is which explains a lot of behavior.

function example() {
  let count = 10;                  // the number lives on the STACK
  let user = { name: "Alice" };    // the object lives on the HEAP;
                                   // `user` (a reference to it) lives on the stack
}
// When example() returns: `count` and the `user` reference pop off the stack.
// The { name: "Alice" } object stays on the heap until GC proves nothing points to it.

Analogy. The stack is your desk: small, right in front of you, instantly cleared when you finish a task. The heap is the warehouse out back: it holds everything large, things stay there after you walk away, and someone (the garbage collector) periodically walks the aisles throwing out boxes that nobody has a delivery slip for anymore.

The practical payoff: pass-by-value vs pass-by-reference

function tweak(num, obj) {
  num = 99;        // reassigns this function's OWN copy of the number
  obj.name = "Bob"; // follows the pointer and mutates the SHARED heap object
}

let n = 1;
let user = { name: "Alice" };
tweak(n, user);
console.log(n);         // 1   -> the primitive was copied, caller unaffected
console.log(user.name); // "Bob" -> the object is shared, caller sees the change

The same mechanism explains why two variables can secretly be the same object:

text

let a = { count: 0 };
let b = a;            // b copies the REFERENCE, not the object

   stack            heap
  ┌──────┐        ┌─────────────┐
  │  a ──┼───────>│ { count: 0 }│
  │  b ──┼───────>│             │   both point at ONE object
  └──────┘        └─────────────┘

b.count = 5;
console.log(a.count); // 5  -> a and b are the same heap object

Beginner trap: "I copied the array, so the original is safe." const copy = original copies the reference, not the data, so mutating copy mutates original. To actually duplicate, you need a shallow copy ([...original], { ...original }) or a deep copy (structuredClone(original)) for nested data. Confusing a reference copy with a value copy is behind a huge share of "why did my other variable change?" bugs.

Why the stack is small and the heap is large

Beginner trap: "stack overflow" vs "out of memory" are different failures. A RangeError: Maximum call stack size exceeded means you pushed too many function frames onto the small stack, almost always from infinite or very deep recursion. A JavaScript heap out of memory means the heap filled up, usually from holding onto too many objects. They sound similar but have completely different causes and fixes: the first is a control-flow problem (fix the recursion, or convert it to a loop or an explicit queue), the second is a memory-retention problem (find what you are holding onto).

How Garbage Collection Actually Works

V8 leans into this by splitting the heap into two generations and collecting them differently.

New Space (the young generation)

Analogy. New Space is the kitchen counter during cooking. You churn through scraps constantly, and every few minutes you sweep the whole counter clean, keeping only the few things still in use. It stays fast precisely because almost everything on the counter is already trash by the time you sweep.

Old Space (the old generation)

Analogy. Old Space is the warehouse. You do not inventory it every few minutes; that would be far too slow. You do a big, thorough audit occasionally: walk every aisle, tag what is still claimed, haul out everything untagged, and slide the remaining boxes together so there are no awkward gaps.

What "reachable" means

Interview answer: "How does garbage collection work in Node.js?" V8 uses a generational, mark-and-sweep collector based on the idea that most objects die young. New objects go into a small New Space collected frequently with a fast copying algorithm called Scavenge, which keeps only the survivors. Objects that live long enough are promoted to a larger Old Space, collected less often with Mark-Sweep-Compact, which marks everything reachable from the roots, sweeps away the rest, and compacts to reduce fragmentation. An object is reclaimed only when nothing references it anymore, so memory leaks happen when code unintentionally keeps references to objects it no longer needs.

Why GC matters for performance: stop-the-world pauses

Beginner trap. "Garbage collection is automatic, so I never have to think about memory." Automatic collection frees you from manually freeing memory, but it does not free you from managing references. If you hold references too long you leak; if you churn huge numbers of objects you make GC work harder and cause pauses. Automatic does not mean free.

The Memory Limit: max-old-space-size and the Heap Ceiling

The default ceiling depends on the Node version and the machine, and the exact number is not something to memorize, because you can always ask V8 directly:

const v8 = require("node:v8");
const limitGB = v8.getHeapStatistics().heap_size_limit / 1024 ** 3;
console.log(`Old space heap limit: ${limitGB.toFixed(2)} GB`);

You raise the ceiling with the --max-old-space-size flag, in megabytes:

bash

# Allow up to ~4 GB of old-space heap
node --max-old-space-size=4096 server.js

# Commonly set via env var so it applies to npm scripts and tooling
NODE_OPTIONS="--max-old-space-size=4096" npm run build

The single most important point in this guide. Raising --max-old-space-size does not fix a memory leak. If your code keeps accumulating references, a bigger heap only means the process takes longer to fill up before it crashes with the exact same error. Increasing the limit is the right move only when your workload genuinely needs more memory at once (large data processing, big builds). When memory climbs steadily under steady load, that is a leak, and the bigger heap just delays the inevitable while making each GC pause longer. Diagnose first; resize second, and only if the data says so.

The container trap

Analogy. The cgroup limit is the weight rating of an elevator; --max-old-space-size is how much you personally decide to load onto the cart you push into it. If you load the cart heavier than the elevator's rating, it does not matter that your cart could hold more; the elevator's safety system stops everything. Always keep your heap setting comfortably under the container's memory limit, leaving headroom for the stack, buffers, and non-heap memory.

Interview answer: "Why does my Node container get OOMKilled even though the app seems fine?" Almost always because the V8 heap limit and the container memory limit are out of sync. Node sizes its heap from the cgroup limit by default, but if --max-old-space-size (or NODE_OPTIONS) sets a heap larger than the container allows, V8 will grow past the container's cap and the kernel's OOM killer ends the process with exit code 137, before V8 reports its own heap error. The fix is to set the heap limit below the container limit with headroom for non-heap memory, or to leave it unset and let Node's container awareness size it.

Reading process.memoryUsage()

console.log(process.memoryUsage());
// {
//   rss: 215_482_368,        // total memory the OS gave the process
//   heapTotal: 138_412_032,  // heap V8 has reserved
//   heapUsed: 119_530_104,   // heap actually in use by your objects
//   external: 8_220_310,     // memory for C++ objects bound to JS (e.g. Buffers)
//   arrayBuffers: 1_540_096  // subset of external: ArrayBuffer/Buffer memory
// }

What each one tells you:

rss (Resident Set Size) is the total physical memory the operating system has handed your process, including the heap, the stack, and Node's own C++ machinery. This is the number your container limit is actually measured against, so it is what gets you OOMKilled.
heapTotal is how much heap V8 has reserved from the OS so far. It grows as needed.
heapUsed is how much of that heap your live JavaScript objects actually occupy. This is the number to watch over time for leaks: if it climbs steadily and never comes back down under steady load, you are leaking.
external is memory used by C++ objects tied to your JavaScript, most commonly Buffers and other binary data. A leak here will not show in heapUsed but will still grow rss.
arrayBuffers is the slice of external specifically for ArrayBuffer and Buffer allocations.

Beginner trap: watching rss to find a JavaScript leak. rss is noisy: it includes non-heap memory, it rarely shrinks even after objects are freed (the OS often lets a process keep memory it might reuse), and it is affected by buffers and native code. For a JavaScript object leak, watch heapUsed across time under stable load. Use rss to understand total footprint and container pressure, not to pinpoint a leak.

Node Memory on a Real Server: EC2, the OS, and Cluster Workers

Picture four nested boxes, each sitting inside the next:

text

┌─────────────────────────────────────────────────────────┐
│ EC2 instance RAM (e.g. t3.medium = 4 GB total)            │
│  ┌──────────────────────────────────────────────────┐    │
│  │ OS, kernel, system agents, filesystem page cache  │    │
│  └──────────────────────────────────────────────────┘    │
│  ┌──────────────────────────────────────────────────┐    │
│  │ Your Node PROCESS  (this whole box = RSS)          │   │
│  │  ┌────────────────────────────────────────────┐   │   │
│  │  │ V8 Heap: your JS objects (New + Old Space)  │   │   │
│  │  │ capped by --max-old-space-size              │   │   │
│  │  └────────────────────────────────────────────┘   │   │
│  │  + Buffers / ArrayBuffers (external, NOT in heap)  │   │
│  │  + stack + compiled code + native addons + libuv  │   │
│  └──────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────┘

Why heapUsed can look fine while the box dies

Analogy. The V8 heap is the trunk of your car, and --max-old-space-size is a rule about how full the trunk may get. But the car's total weight (RSS) also includes passengers, fuel, and roof cargo (Buffers, native memory, code). The bridge's weight limit (instance RAM) is checked against the whole car, not just the trunk. You can obey the trunk rule perfectly and still be too heavy for the bridge.

How a raw EC2 instance differs from a container

The cluster-worker multiplier

bash

# 4 GB instance, 4 cluster workers: budget ~700 MB heap EACH, not the 2 GB default
NODE_OPTIONS="--max-old-space-size=700" pm2 start server.js -i 4

A concrete walkthrough

Take a t3.small (2 GB RAM) running one Node API:

The instance boots; the OS and agents consume about 400 MB, leaving roughly 1.6 GB usable.
Node starts. Container-awareness sees 2 GB and defaults the old-space heap near 1 GB.
The app's live objects settle at heapUsed around 300 MB, and rss sits around 450 MB (heap plus code plus stacks). Healthy.
Traffic spikes with large file uploads, each buffering a 20 MB file. Thirty concurrent uploads is about 600 MB of external Buffer memory. heapUsed barely moves because that is not heap, but rss jumps toward 1.1 GB.
Add the 400 MB of OS overhead and the box nears its 1.6 GB usable ceiling. The kernel OOM killer fires and kills Node, with no V8 heap out of memory error in the logs, because the heap was never the problem. The process RSS outgrew the instance RAM.

Observing it on the box

bash

# Per-process resident memory (rss is in KB)
ps -o pid,rss,comm -p $(pgrep -f "node server.js")

# Whole-instance memory picture
free -m            # total / used / free / available
top                # watch the RES column for your node process

# Was a process OOM-killed by the kernel? Check the kernel log:
dmesg | grep -i "killed process"

And from inside Node, log the breakdown so you can see which region is growing:

setInterval(() => {
  const m = process.memoryUsage();
  const mb = (n) => (n / 1024 / 1024).toFixed(0);
  console.log(
    `rss=${mb(m.rss)}MB heapUsed=${mb(m.heapUsed)}MB ` +
    `external=${mb(m.external)}MB arrayBuffers=${mb(m.arrayBuffers)}MB`
  );
}, 10000);

The practical rules for sizing Node on a VM

Size the heap below usable instance RAM, not total RAM. Leave headroom for the OS, agents, buffers, and native memory. A rough single-process starting point is --max-old-space-size around 60 to 75 percent of (instance RAM minus OS overhead).
Divide the budget across cluster workers. With N workers, each gets roughly one Nth of the app's memory budget, not the full-instance default.
Watch RSS against instance RAM, because that comparison is what decides whether the kernel OOM killer fires, not the heap number.
Give buffer-heavy work extra headroom. Streaming, uploads, and image processing grow external memory invisibly to the heap metrics.
Right-size the instance or scale out. If RSS legitimately needs more than the box offers, a bigger instance or more instances behind a load balancer is the answer, not just raising the heap flag, which never fixes a real leak anyway.

Interview answer: "How does a Node app use the memory on an EC2 instance, and why might it get killed?" A Node process's total memory (its RSS) is the V8 heap plus external memory like Buffers, plus stack, compiled code, native addons, and thread-pool stacks. The instance's RAM is shared with the OS and agents, so only part is usable. The kernel kills based on RSS against available instance RAM, not against the V8 heap limit, so an app can have a flat heapUsed and still be OOM-killed because Buffer-heavy work grew external memory, or because several cluster workers each took a full default heap and collectively exceeded the box. The fixes are to size the heap below usable RAM, divide that budget across workers, watch RSS rather than just the heap, and scale the instance when the workload genuinely needs it.

The Classic Memory Leaks

1. Module-level collections that only grow

A Map, array, or object declared at module scope lives for the entire process. If you keep adding to it and never remove, it grows forever.

// ❌ Leak: every request adds an entry that is never removed.
const cache = new Map();
app.get("/user/:id", async (req, res) => {
  const user = await db.getUser(req.params.id);
  cache.set(req.params.id, user); // grows without bound, forever
  res.json(user);
});

// ✅ Bound it: cap the size, or use a real cache with eviction.
//    LRU (Least Recently Used) drops the entry untouched for the longest;
//    TTL (Time To Live) drops entries after a fixed age.
const cache = new Map();
function remember(key, value) {
  if (cache.size > 10_000) {
    cache.delete(cache.keys().next().value); // evict the oldest
  }
  cache.set(key, value);
}

The naive-cache trap. A plain Map used as a cache with no eviction policy is probably the most common Node leak in the wild. A cache must have a bound: a maximum size, a time-to-live, or both. "Cache forever" is just "leak slowly."

2. Listeners and subscriptions you never remove

Every .on() adds a listener that holds a reference to its callback (and everything that callback closes over). Add them per request without removing them and they pile up.

// ❌ Leak: a new listener every request, never removed.
app.get("/stream", (req, res) => {
  emitter.on("data", (chunk) => res.write(chunk)); // accumulates forever
});

// ✅ Remove it when the request ends.
app.get("/stream", (req, res) => {
  const onData = (chunk) => res.write(chunk);
  emitter.on("data", onData);
  res.on("close", () => emitter.off("data", onData)); // clean up
});

Node's MaxListenersExceededWarning (it fires at 11 listeners on one emitter) is usually a real leak warning, not noise to silence by raising the limit.

3. Timers that are never cleared

setInterval keeps its callback, and everything that callback references, alive for as long as the interval runs. Start intervals tied to a connection or object without clearing them and you leak.

// ❌ Leak: the interval (and everything `bigData` references) lives forever.
function startPolling(bigData) {
  setInterval(() => check(bigData), 1000); // never cleared
}

// ✅ Keep the handle and clear it when done.
function startPolling(bigData) {
  const id = setInterval(() => check(bigData), 1000);
  return () => clearInterval(id); // caller stops it when finished
}

4. Closures that capture more than you think

A closure keeps alive every variable it references from its enclosing scope. A long-lived closure that captures a large object pins that object in memory even if it only uses one small field of it.

// ❌ The handler closes over the entire `hugePayload` just to read one field.
function register(hugePayload) {
  emitter.on("tick", () => log(hugePayload.id)); // pins all of hugePayload
}

// ✅ Capture only what you need.
function register(hugePayload) {
  const id = hugePayload.id;            // extract the small piece
  emitter.on("tick", () => log(id));    // huge payload can now be collected
}

Interview answer: "What are the common causes of memory leaks in Node.js?" The big four are unbounded module-level collections (a Map or array used as a cache with no eviction), event listeners and subscriptions added repeatedly and never removed, timers (setInterval) that are never cleared, and long-lived closures that capture large objects. They share one root cause: code keeps a reference to data it is finished with, so the garbage collector cannot reclaim it because the data is still reachable. The fix is always to drop the reference: bound the cache, remove the listener, clear the timer, or capture only the small piece you need.

The "let it be collected" tools: WeakMap, WeakRef, AbortController

These three exist specifically to avoid the leaks above by not holding things alive longer than needed.

// Cache derived data keyed by an object, without pinning that object alive.
const parsedCache = new WeakMap();

function getParsed(reqObject) {
  if (parsedCache.has(reqObject)) return parsedCache.get(reqObject);
  const parsed = expensiveParse(reqObject);
  parsedCache.set(reqObject, parsed);
  return parsed; // when reqObject is GC'd, its cache entry vanishes on its own
}

const controller = new AbortController();

// The fetch is cancellable; aborting frees it and rejects the promise.
fetch("https://api.example.com/slow", { signal: controller.signal })
  .then(res => res.json())
  .catch(err => {
    if (err.name === "AbortError") return; // expected on cancel
    throw err;
  });

// Cancel it (e.g. on request close, timeout, or component teardown):
controller.abort();

Beginner trap: a plain Map cache keyed by objects leaks; a WeakMap does not. If you key a regular Map by request or user objects and never delete entries, those objects can never be collected, because the Map holds them strongly forever. Switching to a WeakMap lets them go the moment nothing else needs them. Use a WeakMap whenever the key's lifetime should decide the entry's lifetime.

Finding a Leak: Heap Snapshots

The workflow most people use:

bash

# Start the app with the inspector open
node --inspect server.js
# Then open chrome://inspect in Chrome, click "inspect", go to the Memory tab,
# and take heap snapshots. The key technique is the COMPARISON:
#   1. Take snapshot A after warm-up.
#   2. Exercise the suspected path many times (e.g. hammer an endpoint).
#   3. Take snapshot B.
#   4. Compare B to A and sort by "Delta": what grew is your leak suspect.

You can also capture snapshots programmatically, which is handy on a server you cannot attach a debugger to, and Node can even dump one automatically right before it would crash from the heap limit:

const v8 = require("node:v8");
// Writes a .heapsnapshot file you can load into Chrome DevTools later.
v8.writeHeapSnapshot();

bash

# Dump a snapshot automatically when the process approaches the heap limit,
# so you can inspect what filled it up right before the OOM crash.
node --heapsnapshot-near-heap-limit=2 server.js

Analogy. A single heap snapshot is a photograph of a messy room; you cannot tell what is accumulating from one photo. Two snapshots taken before and after some activity are a "spot the difference" pair: whatever is bigger in the second photo is what your code is piling up. The comparison is the whole technique; a lone snapshot rarely tells you much.

Other tools worth naming. --trace-gc prints a line for every garbage collection so you can see how often and how long GC runs (helpful for spotting GC thrash). --prof produces a V8 profile for CPU hotspots. The clinic suite (clinic doctor, clinic heapprofiler, clinic flame) automates much of this and produces readable reports, and is a common answer to "what tools do you use to diagnose Node performance?"

A leak hunt, start to finish

Knowing the tools exist is different from knowing the loop. Here is the whole investigation as you would actually run it, so the pieces connect:

Notice. Your dashboard (or the in-process logger from earlier) shows heapUsed climbing slowly across hours and never dropping back under steady traffic, then the service restarts itself every so often with heap out of memory. That steady upward slope, not a spike, is the signature of a leak rather than a load burst.
Confirm it is the heap. Watch heapUsed specifically. If heapUsed is flat but rss climbs, it is external/Buffer memory, not a classic object leak, and you would look at streams and Buffers instead. Here, heapUsed itself climbs, so it is a retained-object leak.
Capture a baseline. Let the app warm up and reach steady state, then take heap snapshot A (in Chrome DevTools via --inspect, or with v8.writeHeapSnapshot()).
Reproduce the growth. Drive the suspected path hard: replay a few thousand requests to the endpoint you suspect, so whatever is accumulating accumulates a lot. The leak needs to be big in the next snapshot to stand out.
Capture and compare. Take heap snapshot B, then load it in DevTools and switch the view to Comparison against A, sorted by the size delta. The object type that grew by thousands of instances is your suspect: maybe User objects, or Array, or closures from a specific function.
Find what is holding it. Select an instance of the leaking object and read its Retainers (the "retaining path"), which traces the chain of references keeping it alive all the way back to a GC root. That path points straight at the culprit: a module-level Map it was added to, an emitter still listening, an interval still running.
Fix the reference and verify. Drop the reference (bound the cache, remove the listener, clear the timer), redeploy, and watch heapUsed flatten across the same load. A flat line under sustained traffic is the proof the leak is gone.

The mental shortcut. A heap snapshot answers "what is piling up," and the retaining path answers "who is holding it." Almost every leak hunt is those two questions in sequence, and the answer to the second is always one of the four classic causes.

Event Loop Lag: The Other Way Node "Hangs"

You measure it by checking how late timers fire compared to when they were scheduled; that lateness is your event-loop lag:

const { monitorEventLoopDelay } = require("node:perf_hooks");

const h = monitorEventLoopDelay({ resolution: 20 });
h.enable();

setInterval(() => {
  // mean and max delay in milliseconds; rising numbers mean the loop is blocking
  console.log(`loop delay mean=${(h.mean / 1e6).toFixed(1)}ms max=${(h.max / 1e6).toFixed(1)}ms`);
  h.reset();
}, 5000);

The fix is architectural, not a flag. When the loop is blocked by CPU work, the answer is to get that work off the main thread: move it to a worker_thread, break it into chunks that yield with setImmediate between them, push it to a separate service, or replace the algorithm (for example, stream and parse large JSON instead of JSON.parse-ing it whole). You cannot "tune" your way out of blocking; you have to stop blocking.

Interview answer: "How do you detect and fix a blocked event loop?" Detect it by measuring event-loop delay, either with perf_hooks' monitorEventLoopDelay or an APM tool (Application Performance Monitoring: a service like Datadog or New Relic that automatically tracks an app's runtime health and timing); steadily rising delay under load means synchronous work is hogging the thread. Find the culprit with a CPU profile (--prof or clinic flame), which highlights the long synchronous function. Fix it by moving CPU-bound work off the main thread with worker_threads, chunking long loops so they yield to the loop, replacing blocking calls with async ones, or offloading to another service. The single thread must stay free to keep the app responsive.

Other Production Failures Worth Knowing

A few more failures round out what interviewers (and real on-call shifts) throw at you.

process.on("uncaughtException", (err) => {
  logger.fatal(err);     // record what happened
  // optionally flush logs / close critical resources quickly
  process.exit(1);       // then exit; let the supervisor restart a fresh process
});

Beginner trap: assuming a crash-restart loop is "handled." Letting a process manager restart after a fatal error is correct, but if the underlying cause is a leak or a poison input, the fresh process hits the same wall and you get a crash loop: the service flaps up and down, dropping requests each cycle. Restart is a safety net for the unexpected, not a substitute for fixing a reproducible failure. Watch your restart count as a signal.

Monitoring in Production: CloudWatch and Beyond

The "no memory metric on a bare EC2 instance" gotcha

json

// A minimal CloudWatch agent config: report memory and root-disk usage every 60s
{
  "metrics": {
    "append_dimensions": { "InstanceId": "${aws:InstanceId}" },
    "metrics_collected": {
      "mem": { "measurement": ["mem_used_percent"], "metrics_collection_interval": 60 },
      "disk": { "measurement": ["used_percent"], "resources": ["/"], "metrics_collection_interval": 60 }
    }
  }
}

On ECS, memory utilization is built in (the EC2 gotcha does not apply)

The behavior differs slightly by launch type (where the container actually runs):

Where your Node app runs	Memory utilization by default?	Why, and what to add
Bare EC2 instance	No	No in-OS agent; the hypervisor cannot see OS memory. Install the CloudWatch agent (`mem_used_percent`, `CWAgent` namespace).
ECS on Fargate (serverless containers)	Yes, automatic	The ECS agent reports CPU and memory against the task-definition limit (`AWS/ECS` namespace). Nothing to install.
ECS on EC2 (containers on your own instances)	Yes, at service/task level	The ECS container agent provides task memory. But the underlying EC2 instance's own memory still needs the CloudWatch agent if you want it.

Interview-ready summary. "AWS gives you memory metrics" is true for ECS and false for bare EC2, and that catches people. On EC2 you install the CloudWatch agent because the hypervisor cannot see OS memory; on ECS (Fargate or EC2 launch type) the ECS container agent already reports memory against the task-definition limit, so it is there by default. Either way, you still publish app-level metrics (heap, event-loop lag) from inside Node for cause-level insight, because the platform metric only tells you the box or task is full, not why.

Pushing Node's own numbers as custom metrics

const { CloudWatchClient, PutMetricDataCommand } = require("@aws-sdk/client-cloudwatch");
const { monitorEventLoopDelay } = require("node:perf_hooks");

const cw = new CloudWatchClient({});
const loop = monitorEventLoopDelay();
loop.enable();

setInterval(async () => {
  const m = process.memoryUsage();
  const mb = (n) => n / 1024 / 1024;
  await cw.send(new PutMetricDataCommand({
    Namespace: "MyApp/Node",
    // Batch related metrics into ONE call to cut cost and avoid throttling.
    MetricData: [
      { MetricName: "HeapUsedMB", Value: mb(m.heapUsed), Unit: "Megabytes" },
      { MetricName: "RssMB", Value: mb(m.rss), Unit: "Megabytes" },
      { MetricName: "ExternalMB", Value: mb(m.external), Unit: "Megabytes" },
      { MetricName: "EventLoopLagMs", Value: loop.mean / 1e6, Unit: "Milliseconds" },
    ],
  }));
  loop.reset();
}, 60_000);

Cost note. CloudWatch bills custom metrics by the number of distinct metrics and by PutMetricData calls, so batch related metrics into a single call (as above) and pick a sensible interval like 60 seconds rather than every second. An alternative that avoids per-call cost is the embedded metric format (EMF): you write specially structured JSON to your logs and CloudWatch extracts metrics from it.

Alarms and the OOM signature in logs

Metrics are only useful if something watches them. Set CloudWatch alarms so a human gets paged while there is still time to act, not after the crash:

mem_used_percent high for several minutes (say above 85 percent) catches the instance approaching its RAM ceiling before the OOM killer fires.
HeapUsedMB trending up across hours is your leak alarm; pair it with a sustained-slope condition rather than a single spike.
EventLoopLagMs above a threshold catches the event loop blocking before health checks start failing.

Where APM tools fit. APM stands for Application Performance Monitoring: a category of tools that attach to your running app and continuously record how it behaves, like how long each request takes, how often errors happen, how busy the event loop is, and how memory trends over time, then show it all on dashboards with alerting. Think of CloudWatch as monitoring the machine (CPU, RAM, disk) and an APM as monitoring the application running on it (routes, queries, code-level timing). Raw CloudWatch gives you metrics, logs, and alarms, but reading a leak's retaining path or a flame graph (a chart showing which functions consumed the most CPU time) is painful in it. APM tools (Datadog, New Relic, and Grafana are popular ones; OpenTelemetry is an open, vendor-neutral standard for the same job) add automatic Node instrumentation: per-route latency, event-loop lag, garbage-collection pauses, heap trends, and distributed traces (following a single request as it hops across multiple services), usually with far nicer leak and CPU views. The common production setup is CloudWatch for infrastructure and alarms plus an APM for deep application insight. For interviews, knowing why you need the agent (the hypervisor cannot see OS memory) and which metric maps to which failure matters more than any specific vendor.

Interview answer: "How do you monitor a Node service's memory in production on AWS?" It depends on where it runs. On a bare EC2 instance, the default CloudWatch metrics include CPU and network but not memory, because those come from the hypervisor, which cannot see inside the OS, so you install the unified CloudWatch agent to push OS-level memory (mem_used_percent, CWAgent namespace) and alarm on it. On ECS (Fargate or the EC2 launch type) memory utilization is provided automatically in the AWS/ECS namespace, measured against the task-definition limit, because the ECS container agent already reports it, and Container Insights adds per-task detail. In all cases you also publish custom metrics from the app, like heapUsed, rss, and event-loop lag, via PutMetricData or the embedded metric format, and alarm on a sustained heap climb (a leak) or rising loop lag (blocking). For OOM events you rely on logs, since a killed process leaves a dmesg/exit-137 signature rather than a clean metric. Many teams add an APM tool on top for richer heap and trace views.

Putting It Together: A Diagnostic Playbook

When a Node service misbehaves in production, the symptom usually points to the category:

Memory grows steadily and never drops, then crashes with heap out of memory. That is a leak. Confirm by watching heapUsed over time, then compare two heap snapshots to find what is accumulating, and look first at caches, listeners, timers, and closures.
The process dies with OOMKilled / exit code 137 and no V8 error. That is the container trap: the heap limit and the container limit are out of sync, or total rss (heap plus buffers plus native) exceeds the cgroup cap. Align --max-old-space-size below the container limit with headroom.
Latency is fine on average but terrible at p99, with periodic spikes. Suspect GC pauses (check --trace-gc) or intermittent event-loop blocking (measure with monitorEventLoopDelay).
The whole service freezes and health checks fail, but it has not crashed. The event loop is blocked by synchronous CPU work. Profile to find it, then move it off the main thread.
Requests start failing with EMFILE or pool-timeout errors while CPU and memory look healthy. Resource exhaustion: leaked file descriptors or connections. Bound the pools and release in finally.

Interview answer: "How would you debug a Node service that is slowly using more memory until it crashes?" First confirm it is actually a heap leak by watching process.memoryUsage().heapUsed over time under steady load; a steady climb that never recedes confirms it. Then take a heap snapshot after warm-up, exercise the app, take a second snapshot, and compare them sorted by growth to identify which objects are accumulating. The cause is almost always a retained reference: an unbounded cache, listeners or timers never cleaned up, or a closure pinning a large object. Fix the reference rather than raising --max-old-space-size, because a bigger heap only delays the same crash. If it is a container, also verify the heap limit sits safely under the container memory limit to avoid an OOM kill.

aws ecs containers node.js ec2

Naga Sai Rao

Some things fade fast. Some last. Learn the ones that last.

About the author →

Web Development

NodeJS Fundamentals

Master Node.js for interviews: the event loop, async patterns, streams, concurrency, and the beginner traps that quietly sink candidates. With worked examples.

June 23, 202639 min read

node.js backend

Web Development

React Rendering Strategies, Explained: CSR, SSR, SSG, ISR, and PPR

CSR, SSR, SSG, ISR, and PPR explained with realistic examples, plus exactly which Core Web Vitals each rendering strategy moves. From build time to the browser.

June 23, 202631 min read

react performance next.js

Frontend

React Fundamentals

A practical, example-driven React guide from fundamentals to React 19. Master hooks, the Virtual DOM, performance, and the gotchas interviewers actually test.

June 23, 202641 min read

react

Where Your Data Lives: Stack and Heap

The practical payoff: pass-by-value vs pass-by-reference

Why the stack is small and the heap is large

How Garbage Collection Actually Works

New Space (the young generation)

Old Space (the old generation)

What "reachable" means

Why GC matters for performance: stop-the-world pauses

The Memory Limit: max-old-space-size and the Heap Ceiling

The container trap

Reading process.memoryUsage()

Node Memory on a Real Server: EC2, the OS, and Cluster Workers

Why heapUsed can look fine while the box dies

How a raw EC2 instance differs from a container

The cluster-worker multiplier

A concrete walkthrough

Observing it on the box

The practical rules for sizing Node on a VM

The Classic Memory Leaks

1. Module-level collections that only grow

2. Listeners and subscriptions you never remove

3. Timers that are never cleared

4. Closures that capture more than you think

The "let it be collected" tools: WeakMap, WeakRef, AbortController

Finding a Leak: Heap Snapshots

A leak hunt, start to finish

Event Loop Lag: The Other Way Node "Hangs"

Other Production Failures Worth Knowing

Monitoring in Production: CloudWatch and Beyond

The "no memory metric on a bare EC2 instance" gotcha

On ECS, memory utilization is built in (the EC2 gotcha does not apply)

Pushing Node's own numbers as custom metrics

Alarms and the OOM signature in logs

Putting It Together: A Diagnostic Playbook

Related

NodeJS Fundamentals

React Rendering Strategies, Explained: CSR, SSR, SSG, ISR, and PPR

React Fundamentals

Where Your Data Lives: Stack and Heap

The practical payoff: pass-by-value vs pass-by-reference

Why the stack is small and the heap is large

How Garbage Collection Actually Works

New Space (the young generation)

Old Space (the old generation)

What "reachable" means

Why GC matters for performance: stop-the-world pauses

The Memory Limit: max-old-space-size and the Heap Ceiling

The container trap

Reading process.memoryUsage()

Node Memory on a Real Server: EC2, the OS, and Cluster Workers

Why heapUsed can look fine while the box dies

How a raw EC2 instance differs from a container

The cluster-worker multiplier

A concrete walkthrough

Observing it on the box

The practical rules for sizing Node on a VM

The Classic Memory Leaks

1. Module-level collections that only grow

2. Listeners and subscriptions you never remove

3. Timers that are never cleared

4. Closures that capture more than you think

The "let it be collected" tools: WeakMap, WeakRef, AbortController

Finding a Leak: Heap Snapshots

A leak hunt, start to finish

Event Loop Lag: The Other Way Node "Hangs"

Other Production Failures Worth Knowing

Monitoring in Production: CloudWatch and Beyond

The "no memory metric on a bare EC2 instance" gotcha

On ECS, memory utilization is built in (the EC2 gotcha does not apply)

Pushing Node's own numbers as custom metrics

Alarms and the OOM signature in logs

Putting It Together: A Diagnostic Playbook

Related

NodeJS Fundamentals

React Rendering Strategies, Explained: CSR, SSR, SSG, ISR, and PPR

React Fundamentals