Skip to main content

🔵 Blue-Green Deployment

Zero downtime updates

The Stage Swap Analogy

A theater production:

  • Blue stage: Current show running
  • Green stage: Next show being prepared

When ready:

  • Open curtains to green stage
  • Audience sees new show instantly
  • If problems? Switch back to blue!

Blue-green deployment works the same way. Two identical environments, instant switch.


How Blue-Green Works

The Setup

Two identical environments:

┌─────────────────┐     ┌─────────────────┐
│      BLUE       │     │     GREEN       │
│  (current)      │     │    (new)        │
│                 │     │                 │
│  ┌───┐ ┌───┐    │     │  ┌───┐ ┌───┐    │
│  │App│ │App│    │     │  │App│ │App│    │
│  └───┘ └───┘    │     │  └───┘ └───┘    │
│  ┌───┐ ┌───┐    │     │  ┌───┐ ┌───┐    │
│  │DB │ │Cache│  │     │  │DB │ │Cache│  │
│  └───┘ └───┘    │     │  └───┘ └───┘    │
└─────────────────┘     └─────────────────┘
        ↑
  Traffic (live)          (Ready, but idle)

The Switch

Before: All traffic → Blue (current)
  Green (new) is ready, tested

Switch: Change load balancer
  All traffic → Green (new)
  Blue (current) is now idle

Time to switch: Seconds!

The Complete Flow

1. BLUE serves all traffic (current version)
   Users: → Blue

2. Deploy new version to GREEN
   Users: → Blue
  GREEN gets the new version, tested internally

3. Test GREEN thoroughly
   Smoke tests, integration tests
   Make sure it works!

4. SWITCH traffic to GREEN
   Users: → Green
   Done in seconds

5. Monitor GREEN
   Watch metrics, errors, performance

6. Problem? SWITCH back to BLUE
   Instant rollback!
   Users: → Blue (unaffected)

7. All good? BLUE becomes next staging
  Deploy the next version to Blue when ready...

Why Blue-Green?

Instant Rollback

Rolling deployment rollback:
  Gradually roll back each instance
  Takes time

Blue-green rollback:
  Switch traffic back to blue
  Done in seconds!

Zero Downtime

No overlap period of mixed versions
Switch is atomic: all traffic moves at once
Users don't experience half-updated state

Full Testing Before Release

Green is 100% production-identical
Test with real configuration
Smoke test with real traffic (optionally)
Confidence before the switch

Blue-Green vs Other Strategies

StrategySwitch TimeRollbackResourcesTesting
Blue-GreenInstantInstant2xPre-production
RollingGradualGradual1x + surgeDuring rollout
CanaryGradualFast1x + canaryPercentage
RecreateMinutesSlow1xPre-production

When to Use Blue-Green

✓ Need instant rollback capability
✓ Can afford double infrastructure
✓ Prefer atomic switches
✓ Testing in production-identical environment

When NOT to Use

✗ Resource-constrained (2x expensive!)
✗ Database schema changes (both envs share DB)
✗ Want gradual rollout with metrics

Database Considerations

The Challenge

Blue uses database schema v1
Green uses database schema v2

Both pointing to same database?
  Schema change breaks Blue!

Solutions

Option 1: Backward-compatible migrations (when possible)
  Add columns, don't remove
  Old code ignores new columns
  New code handles missing columns

Option 2: Separate databases
  Copy data to green's database
  More complex, more storage

Option 3: Feature flags
  New code handles old schema
  Switch flag after migration complete

Implementation

Load Balancer Switch

Load balancer points to environment:

Before:
  LB → Blue pool (instances A, B, ...)

After switch:
  LB → Green pool (instances C, D, ...)

All traffic moves instantly.

DNS Switch

DNS points to environment:

Before:
  app.example.com → Blue IP

After switch:
  app.example.com → Green IP

⚠️ DNS propagation can take time!
Load balancer switch is faster.

With Kubernetes

Two deployments:
  app-blue, app-green

Service selector switch:
  Before: selector: version=blue
  After:  selector: version=green

Traffic shifts immediately.

Verification Steps

Before Switching

1. Deploy to green
2. Run smoke tests on green
3. Compare green metrics to blue
4. Manual spot check of green
5. Get approval if needed
6. Switch!

After Switching

1. Monitor error rates
2. Monitor latency
3. Watch business metrics
4. Check logs for anomalies
5. Be ready to switch back!

Common Mistakes

1. Forgetting Database Compatibility

Schema migration breaks blue.
Now you CAN'T roll back!

Try to keep changes backward-compatible during the cutover.

2. Not Testing Green Thoroughly

"It worked in staging..."
Green IS production. Test it fully.

3. Deleting Blue Too Soon

"Green is working, delete blue"
Problem found later...
No rollback available!

Keep blue for at least a day.

4. Session Stickiness Issues

User session in blue.
Switch to green.
Session lost!

Use shared session store (Redis).

Cost Considerations

Running duplicate infrastructure = 2x cost

Ways to reduce:
  - Keep idle environment smaller
  - Spin up green just for deploys
  - Use spot/preemptible instances for green

Trade-off: cost vs rollback speed

FAQ

Q: Blue-green vs canary?

Blue-green: all or nothing, instant Canary: gradual percentage, measured

Canary finds problems earlier (less blast radius). Blue-green has instant rollback.

Q: Which color is production?

Whichever has traffic! Colors alternate.

Q: How long to keep old environment?

At least until confidence in new version. Hours to days.

Q: What about background jobs?

Need to handle: drain jobs before switch, or run in both environments.


Summary

Blue-green deployment maintains two identical environments, switching all traffic instantly for zero-downtime releases.

Key Takeaways:

  • Two environments: one live, one prepared
  • Switch traffic with load balancer or DNS
  • Instant rollback if problems
  • Requires 2x infrastructure
  • Database changes need careful planning
  • Test green thoroughly before switch
  • Keep blue available for rollback

Blue-green can give you very fast rollback.

Leave a Comment

Comments (0)

Be the first to comment on this concept.

Comments are approved automatically.