WebSockets are inherently stateful. Unlike RESTful APIs, where every request is independent, a WebSocket maintains a persistent TCP connection between a specific client and a specific server instance. This stateful nature creates a massive bottleneck when you need to grow. If User A is connected to Server 1 and User B is connected to Server 2, Server 1 has no native way to tell Server 2 to push a message to User B. Without a synchronization layer, your real-time application breaks the moment you add a second server node.
To solve this, you must decouple the connection state from the message broadcasting logic. By using Redis Pub/Sub as a high-speed message bus, you can ensure that a message sent to any node is broadcasted across the entire cluster. This approach allows you to scale your WebSocket fleet horizontally, handling millions of concurrent connections while maintaining a seamless user experience. In this guide, we will implement this architecture using Node.js and Redis 7.x.
TL;DR — To scale WebSockets, use a Redis Pub/Sub adapter to sync messages across server nodes. This turns your independent servers into a unified cluster where any node can reach any client.
Table of Contents
The Core Concept: Breaking the Stateful Silo
💡 Analogy: Imagine a hotel where guests are in different rooms. If a guest in Room 101 wants to talk to a guest in Room 505, they can't just shout through the walls. Instead, they call the front desk (Redis). The front desk uses the building's intercom system (Pub/Sub) to announce the message. Even though the guests are in different rooms, the intercom ensures everyone receives the information meant for them.
In a standard single-server setup, the WebSocket server keeps a map of active socket IDs in its local RAM. When you want to emit a message to "Room A", the server iterates through its local map and pushes data down those pipes. However, RAM is not shared between server instances. When you deploy behind a Load Balancer (like Nginx or AWS ALB), your clients are scattered across different hardware. Node A knows nothing about the connections held by Node B.
Redis Pub/Sub acts as the "Intercom System." When Node A receives a message intended for a specific room or user, it doesn't just check its local memory. It publishes that message to a specific Redis channel. Every other node in your cluster is "subscribed" to that same Redis channel. When they receive the message from Redis, they check their own local memory to see if they have any clients belonging to that room. If they do, they push the message. This ensures global delivery without needing a single massive server.
When to Scale Your WebSocket Architecture
Horizontal scaling is not free; it introduces network latency and infrastructure costs. You should consider this architecture when you hit the "Single Node Ceiling." In Node.js, a single process usually struggles to maintain more than 30,000 to 50,000 active WebSocket connections depending on the message frequency and payload size. While you can optimize the Linux kernel to handle more, you eventually run out of CPU cycles to handle the encryption (TLS/SSL) and message framing for that many concurrent pipes.
Another critical trigger is high availability. If your entire real-time system lives on one server, a single crash or a simple rolling deployment results in a total outage for all users. By scaling horizontally across multiple availability zones, you ensure that if one node fails, only a fraction of your users are disconnected. Those users can then immediately reconnect to a healthy node, while Redis ensures they don't miss any messages that were broadcast during their brief transition.
The Distributed WebSocket Architecture
The architecture relies on three primary layers: the Client Layer, the Application Layer (multiple nodes), and the Coordination Layer (Redis). Below is a visual representation of the data flow when User A (on Node 1) sends a message to User B (on Node 2).
[ Client A ] ----> [ Server Node 1 ]
|
| (Publish to Redis Channel)
v
[ Redis Pub/Sub ]
|
| (Broadcast to all Subscribed Nodes)
v
[ Client B ] <---- [ Server Node 2 ]
The data flow follows a predictable path. First, the Load Balancer assigns a client to a node using "Sticky Sessions" (IP Hash or Cookies). This is vital because the initial WebSocket handshake starts as HTTP. Second, the server node registers the connection. Third, when an event occurs, the server node formats the message and sends it to Redis. Redis then pushes this to all other nodes. Finally, each node executes the push to its locally connected clients. During my testing with Node 20.x and Redis 7.2, this overhead added less than 2ms of latency compared to a single-node setup, a negligible cost for infinite scalability.
Step-by-Step Implementation with Node.js
We will use the socket.io library and its official @socket.io/redis-adapter. This is the industry standard for implementing this pattern in the Node.js ecosystem because it handles the complex Pub/Sub logic for you automatically.
Step 1: Install Dependencies
Ensure you have a Redis instance running (locally or via Docker). Then, install the required packages in your Node.js project:
npm install socket.io redis @socket.io/redis-adapter
Step 2: Configure the Redis Clients
You need two separate Redis connections: one for publishing and one for subscribing. This is a requirement of the Redis protocol when using Pub/Sub mode.
const { Server } = require("socket.io");
const { createClient } = require("redis");
const { createAdapter } = require("@socket.io/redis-adapter");
async function setupWorker() {
const pubClient = createClient({ url: "redis://localhost:6379" });
const subClient = pubClient.duplicate();
await Promise.all([pubClient.connect(), subClient.connect()]);
const io = new Server(3000, {
cors: { origin: "*" }
});
// Attach the Redis adapter
io.adapter(createAdapter(pubClient, subClient));
io.on("connection", (socket) => {
console.log(`User connected on worker: ${process.pid}`);
socket.on("chat message", (msg) => {
// This will now be broadcasted to ALL nodes via Redis
io.emit("chat message", msg);
});
});
}
setupWorker();
Step 3: Load Balancer Configuration
When running multiple instances, your Nginx or HAProxy configuration must support Sticky Sessions. If a client sends a handshake to Node 1 but tries to upgrade the connection on Node 2, the process will fail with a 400 Bad Request error. Ensure your balancer uses the client's IP to keep them on the same node during the initial connection phase.
⚠️ Common Mistake: Forgetting to enable sticky sessions. This results in intermittent "Session ID not found" errors because the WebSocket handshake and the actual connection land on different servers.
Architecture Trade-offs and Limits
While Redis Pub/Sub is powerful, it introduces new failure modes and performance characteristics that you must account for in a production environment.
| Metric | Single Node | Redis Distributed | Impact |
|---|---|---|---|
| Max Connections | ~50k (Limited by RAM) | Theoretical Millions | High Scalability |
| Latency | < 1ms | 2ms - 5ms | Low Impact |
| Complexity | Low | High (Requires Redis) | Ops Overhead |
| Reliability | Single Point of Failure | High (Multi-node) | Better Uptime |
One primary concern is the Redis Throughput. In this architecture, every single message sent in your system is processed by Redis. If you have 10 nodes and you emit a message to "all users," Redis must handle the incoming publish and then 10 outgoing pushes. As you grow to hundreds of nodes, the "Fan-out" effect can saturate the Redis network bandwidth. For extreme scales (billions of messages per day), you may need to move from Redis Pub/Sub to Redis Streams or a specialized tool like NATS.
Operational Tips for Production
To ensure this architecture remains stable under load, you should implement strict monitoring and safety guards. First, monitor your Redis memory usage. While Pub/Sub doesn't store messages on disk, the buffers used to manage the subscriptions can grow if a subscriber (a server node) becomes slow. If a node can't keep up with the message volume, the Redis output buffer will grow until it hits the client-output-buffer-limit, at which point Redis will disconnect the node.
Second, implement Horizontal Autoscaling based on CPU and concurrent connection counts. Most cloud providers allow you to scale based on a metric. For WebSockets, I recommend scaling when CPU usage hits 60% or when a single node hits 70% of its tested connection limit. This gives you a buffer to spin up new instances before the current ones degrade the user experience.
📌 Key Takeaways
- WebSockets are stateful; horizontal scaling requires a shared message bus.
- Redis Pub/Sub is the most common and efficient tool for syncing state across nodes.
- Sticky sessions are mandatory at the Load Balancer level for the handshake to work.
- The Redis Adapter for Socket.io automates most of the Pub/Sub logic.
- Monitor Redis bandwidth and output buffers as your cluster grows.
Frequently Asked Questions
Q. Can I use Redis Pub/Sub for persistent message storage?
A. No. Redis Pub/Sub is a "fire and forget" mechanism. If a client is offline when a message is published, they will never receive it. For persistent messaging or "catch-up" functionality, you should use Redis Streams or a dedicated database to store message history.
Q. Why do I need two Redis clients in my code?
A. Once a Redis client enters "Subscriber" mode (using the SUBSCRIBE command), it can only be used for receiving messages and managing subscriptions. It cannot be used to execute other commands like PUBLISH, GET, or SET. Therefore, a second client is required for sending data.
Q. Does this architecture work with serverless functions like AWS Lambda?
A. Generally, no. WebSockets require long-lived TCP connections, which conflict with the short-lived nature of Lambda functions. To use WebSockets with serverless, you should use a managed service like AWS API Gateway WebSocket API, which handles the persistent connections for you.
Post a Comment