ECS Fargate Blue-Green Deployments with ALB and CodeDeploy

Standard rolling updates in AWS ECS often suffer from a "gray period" where new and old versions coexist, potentially leading to inconsistent user experiences or dropped connections during target group deregistration. While Fargate simplifies infrastructure management, achieving true zero-downtime releases requires a more sophisticated orchestration layer. Using AWS CodeDeploy to manage Blue-Green deployments allows you to verify new code in a live environment before shifting 100% of production traffic.

This guide demonstrates how to configure an Application Load Balancer (ALB) and ECS Fargate to perform instant traffic cutovers. By the end of this tutorial, you will have a pipeline that supports automated rollbacks if health checks fail during the deployment window.

TL;DR — Deploy two identical Target Groups (Blue and Green). Use CodeDeploy to swap the ALB production listener from Blue to Green while keeping the old environment alive for 5–15 minutes to ensure a safe rollback path.

Understanding the Blue-Green Mechanism

💡 Analogy: Think of a Blue-Green deployment like a high-speed railway switch. Instead of asking passengers to jump from an old train to a new one while moving (Rolling Update), you build a second, identical track (Green) and simply flip a switch to redirect the train to the new line. If the new track is bumpy, you flip the switch back immediately.

In the context of AWS ECS, the "Blue" environment is your current production fleet. The "Green" environment is a fresh set of Fargate tasks running the new container image. CodeDeploy acts as the traffic controller. It doesn't just replace tasks; it modifies the ALB listener rules to point to the Green Target Group. This happens nearly instantaneously, minimizing the duration of mixed-version traffic.

This approach relies on two separate Target Groups. One is "Production" (usually on port 443/80), and the other is often used for "Test" (e.g., port 8080). This allows your QA team or automated scripts to run smoke tests on the Green environment before the final cutover occurs. If the tests pass, CodeDeploy swaps the production listener to point to the Green targets.

When to Choose Blue-Green Over Rolling Updates

Rolling updates are the default in ECS because they require less infrastructure overhead. However, they are not always sufficient for mission-critical applications. You should move to Blue-Green deployments if your application requires strict versioning consistency. For example, if a database schema change is only compatible with the new version of the app, running both versions simultaneously during a rolling update could cause errors.

Another major driver is rollback speed. With a rolling update, rolling back requires a full redeployment of the previous image, which can take several minutes as tasks are provisioned and health checks pass. In a Blue-Green setup, the old tasks are kept in a "decommissioning" state but remain active. Rolling back is as simple as flipping the ALB listener back to the Blue Target Group, which takes seconds.

Step-by-Step Implementation

Step 1: Configure the Application Load Balancer

You need two Target Groups (e.g., tg-blue and tg-green) pointing to your ECS cluster. Both must be of type IP, as Fargate uses awsvpc networking. Set the deregistration delay to 30–60 seconds to avoid long wait times during deployment testing.

# Example AWS CLI command to create the second target group
aws elbv2 create-target-group \
--name ecs-fargate-green-tg \
--protocol HTTP \
--port 80 \
--vpc-id vpc-xxxxxx \
--target-type ip \
--health-check-path /health

Step 2: ECS Service Configuration

When creating your ECS Service, you must select the deployment controller type as CODE_DEPLOY. Note that you cannot change this after the service is created. If you have an existing service using ECS (Rolling Update), you must recreate it. Link the service to your primary production listener and target group.

Step 3: Define the AppSpec File

CodeDeploy requires an appspec.yaml file to understand which ECS service and containers to update. This file should be included in your deployment artifact or defined in your CI/CD pipeline. Use the following structure for ECS Fargate:

version: 0.0
Resources:
- TargetService:
Type: AWS::ECS::Service
Properties:
TaskDefinition: "arn:aws:ecs:region:account:task-definition/my-task:v2"
LoadBalancerInfo:
ContainerName: "web-app"
ContainerPort: 80
Hooks:
BeforeAllowTraffic: "LambdaFunctionToValidateNewVersion"
AfterAllowTraffic: "LambdaFunctionToCleanUpResources"

Common Pitfalls and Solutions

⚠️ Common Mistake: Mismatching the Container Name in AppSpec. CodeDeploy will fail with an "Internal Error" if the ContainerName in your appspec.yaml does not exactly match the container name defined in your Task Definition.

One frequent issue involves IAM permissions. The CodeDeploy service role requires specific permissions to interact with ECS and ELB. If you see the error The service role does not have permission to access the resources, ensure your role has the AWSCodeDeployRoleForECS managed policy attached. This policy grants the necessary rights to modify ALB listeners and update ECS services.

Another challenge is the Target Group stickiness. If your ALB has session stickiness enabled, the "Green" environment might not receive traffic evenly until the old sessions expire. For a clean cutover, consider lowering the cookie duration or disabling stickiness during the deployment window. Ensure you have monitored ECS Auto Scaling metrics to handle the sudden shift in traffic to new nodes.

Optimization Tips for Production

To maximize the effectiveness of Blue-Green deployments, implement a "Baking Time." In CodeDeploy, configure the deployment to wait for 5 to 10 minutes after traffic has shifted to Green before terminating the Blue instances. This period allows you to monitor CloudWatch Alarms for 5XX errors or latency spikes. If an alarm triggers, CodeDeploy can automatically initiate a rollback.

Use Lambda Hooks for automated testing. The AfterAllowTestTraffic hook is particularly powerful. It triggers after the Green environment is provisioned but before production traffic is shifted. You can run a Lambda function that executes integration tests against the test listener port. If the Lambda returns a failure, the deployment stops before any real users are impacted.

📌 Key Takeaways

  • Blue-Green deployments eliminate version skew during releases.
  • Fargate requires Target Groups with target-type: ip.
  • CodeDeploy automates the ALB listener swap for zero-downtime cutover.
  • Always include a "Wait Time" to allow for manual or automated verification before Blue termination.

Frequently Asked Questions

Q. Does Blue-Green deployment double my AWS costs?

A. Only temporarily. During the deployment window, you will have two sets of Fargate tasks running simultaneously. However, since Fargate is billed per second, and Blue tasks are usually terminated 15–30 minutes after a successful cutover, the cost impact is minimal for most organizations.

Q. How long does a rollback take if the Green environment fails?

A. Rollbacks are nearly instantaneous (usually under 30 seconds). Since the Blue tasks are still running and healthy, CodeDeploy simply instructs the ALB to point its listener back to the Blue Target Group. This is significantly faster than redeploying an old image in a rolling update.

Q. Where should the appspec.yaml file be stored?

A. It is typically stored in the root of your source repository. When using AWS CodePipeline, the file is bundled into the output artifact from the Build stage and passed to the Deploy stage, where CodeDeploy reads it to execute the deployment logic.

Post a Comment