How to Gracefully Drain Kubernetes Nodes for WebSocket Traffic

Draining a Kubernetes node typically involves evicting pods so the underlying instance can be patched, upgraded, or retired. For standard REST APIs, this process is straightforward: the pod receives a SIGTERM, finishes its quick request-response cycle, and exits. However, WebSockets and long-lived connections present a unique challenge. If you run kubectl drain without a specific strategy for stateful connections, your users will experience immediate disconnections, potentially leading to data loss or broken UI states. You need a way to signal your application to stop accepting new messages while allowing existing streams to finish naturally.

To achieve a zero-downtime experience for WebSocket-heavy applications, you must align the Kubernetes pod lifecycle with your application’s connection management. This requires fine-tuning the terminationGracePeriodSeconds, implementing preStop hooks, and ensuring your Ingress controller removes the pod from the load-balancing pool before the application starts its shutdown sequence. When configured correctly, you can migrate thousands of active WebSocket users between nodes without them ever realizing a backend shift occurred.

TL;DR — Use a preStop hook to sleep for 30–60 seconds, allowing the Ingress to stop routing new traffic. Set terminationGracePeriodSeconds higher than your longest expected connection duration or implement application-level logic to "drain" clients by sending a close frame before the SIGTERM kills the process.

Understanding the Kubernetes Pod Termination Lifecycle

💡 Analogy: Think of a Kubernetes pod as a restaurant. kubectl drain is the notice that the restaurant is closing for the night. A standard API request is like a customer ordering a coffee; they finish quickly and leave. A WebSocket connection is like a customer staying for a five-course meal. If you turn off the lights the moment the "Closing" sign goes up, the five-course diners are left in the dark. Graceful draining means stopping new diners at the door while letting the ones inside finish their dessert.

In Kubernetes 1.30, the termination sequence starts when a pod is marked as "Terminating". Immediately, the pod is removed from the Endpoints list of its associated Service. This stops new traffic from reaching the pod. Simultaneously, Kubernetes triggers the preStop hook if one is defined. Once the hook finishes, the container receives a SIGTERM signal. The application is expected to handle this signal by closing database connections and finishing active tasks. If the application doesn't exit within the terminationGracePeriodSeconds, Kubernetes sends a SIGKILL, which forcefully terminates the process.

The core problem with WebSockets is timing. If your Ingress controller takes 10 seconds to update its routing table, but your application receives the SIGTERM instantly, you might still have new requests hitting a dying pod. Furthermore, WebSockets don't have a "finish" state like HTTP responses. They stay open until one side closes them. Therefore, you must proactively manage how and when these connections are terminated to prevent "connection reset" errors on the client side.

When to Optimize for WebSocket Draining

Optimization is necessary whenever your application relies on stateful, long-lived bidirectional communication. Real-time chat applications (Slack clones), financial trading dashboards, and multiplayer gaming backends are the primary candidates. In these scenarios, a sudden disconnection causes a "reconnect storm" where thousands of clients simultaneously attempt to re-establish sessions, potentially overwhelming your authentication services or database. By gracefully draining, you can spread these reconnections over a longer window.

You should also focus on this during cluster maintenance windows or when using Spot instances (AWS) or Preemptible VMs (GCP). Since Spot instances can be reclaimed with only a two-minute warning, having a robust preStop and termination strategy is the difference between a minor blip and a total service outage. If your WebSocket traffic represents more than 20% of your total ingress, or if your clients maintain sessions for longer than 60 seconds, the default Kubernetes settings are likely insufficient for your needs.

Step-by-Step Implementation for Graceful Drains

Step 1: Define a Long Termination Grace Period

The default terminationGracePeriodSeconds is 30 seconds. For WebSockets, this is often too short. Increase this value in your Deployment manifest to give your connections time to migrate. We recommend a value that covers your 95th percentile session length, or at least 5 to 10 minutes for interactive apps.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: websocket-server
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 300 # 5 minutes
      containers:
      - name: app
        image: my-websocket-app:v1.2.0

Step 2: Add a preStop Hook to Handle Propagation Delay

The preStop hook is a lifecycle event that runs before the SIGTERM is sent. The most common pattern is to "sleep" for a few seconds. This delay ensures that the Ingress controller (like NGINX or Envoy) has enough time to see the pod is "Terminating" and remove it from its upstream list. Without this, the pod might receive SIGTERM and stop accepting traffic while the Ingress is still trying to send new clients its way.

      containers:
      - name: app
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 20"]

Step 3: Handle SIGTERM in Your Application Code

When the preStop hook finishes, your app receives SIGTERM. Your code should catch this signal and start its own internal "drain" process. Instead of hard-closing all sockets, start sending "Going Away" frames (WebSocket close code 1001) to clients at a staggered rate. This prevents the aforementioned reconnect storm. In Node.js, it looks like this:

process.on('SIGTERM', () => {
  console.log('Received SIGTERM. Draining WebSockets...');
  
  // 1. Stop accepting new connections
  server.close();

  // 2. Gently close existing connections
  for (const socket of activeSockets) {
    // Optional: add a random delay here to stagger reconnections
    socket.close(1001, "Server is undergoing maintenance");
  }
});

Common Pitfalls and How to Avoid Them

⚠️ Common Mistake: Setting a preStop hook sleep duration longer than the terminationGracePeriodSeconds.

If your preStop hook takes 60 seconds but your grace period is only 30 seconds, Kubernetes will kill the pod before the hook even finishes. Always ensure terminationGracePeriodSeconds is at least 30% larger than your preStop delay plus your application's internal cleanup time. Failure to align these numbers results in "unclean" shutdowns where the kernel simply kills the process, leaving orphaned resources or corrupted state in external caches like Redis.

Another pitfall is ignoring the Ingress controller's configuration. Many NGINX Ingress setups have a default worker-shutdown-timeout. If your pod is ready to drain over 5 minutes, but your Ingress controller kills its proxy connection after 60 seconds, your graceful pod configuration won't matter. Ensure you check your Ingress global config for proxy read/write timeouts and shutdown grace periods to match your application's requirements. We observed in production that mismatching these timeouts caused a 5% error spike during rolling updates.

Metric-Backed Tips for High-Scale Clusters

To verify your drain strategy is working, monitor the kube_pod_status_phase metric alongside your active WebSocket connection count. During a node drain, you should see the connection count for terminating pods decrease linearly rather than dropping to zero instantly. If the line is vertical, your grace period is too short or your app is ignoring signals. Use Prometheus and Grafana to visualize the overlap between "Terminating" pods and "Running" pods; there should be a healthy crossover period where both exist to handle the transition.

Consider implementing a "Jittered Reconnect" on the client side. Even with a perfect backend drain, if 10,000 clients receive a "close" frame at the same second, they will all hit your Load Balancer simultaneously. Add a random delay (e.g., Math.random() * 5000) to your client-side reconnection logic. This flattens the CPU spike on your ingress controllers and database, ensuring the newly created pods on the fresh nodes can handle the incoming load without triggering an HPA (Horizontal Pod Autoscaler) panic.

📌 Key Takeaways
  • Always use a preStop hook with sleep to allow for Endpoint propagation.
  • Set terminationGracePeriodSeconds significantly higher for WebSocket workloads.
  • Application code must catch SIGTERM and stagger client disconnections.
  • Synchronize timeouts between Kubernetes, your Ingress Controller, and the Application.

Frequently Asked Questions

Q. Does kubectl drain stop all traffic immediately?

A. Not exactly. It marks pods for eviction, which triggers the removal of the pod from the Service Endpoints. However, existing TCP connections (like WebSockets) remain established until the pod process exits or the load balancer kills the connection. This is why explicit termination handling is required.

Q. What is the difference between preStop hooks and SIGTERM?

A. A preStop hook runs before the SIGTERM signal is sent to the container. It is useful for blocking the shutdown process to allow external systems (like Load Balancers) to update. Once the hook finishes, the container receives the SIGTERM to begin its internal cleanup.

Q. How can I test my graceful shutdown without a full node drain?

A. You can simulate the process by running kubectl delete pod [pod-name]. This triggers the same termination lifecycle as a node drain. Monitor your application logs and client-side network tab to ensure the connection closes with the expected 1001 status code and follows your defined timing.

Post a Comment