The Stage Swap Analogy
A theater production:
- Blue stage: Current show running
- Green stage: Next show being prepared
When ready:
- Open curtains to green stage
- Audience sees new show instantly
- If problems? Switch back to blue!
Blue-green deployment works the same way. Two identical environments, instant switch.
How Blue-Green Works
The Setup
Two identical environments:
┌─────────────────┐ ┌─────────────────┐
│ BLUE │ │ GREEN │
│ (current) │ │ (new) │
│ │ │ │
│ ┌───┐ ┌───┐ │ │ ┌───┐ ┌───┐ │
│ │App│ │App│ │ │ │App│ │App│ │
│ └───┘ └───┘ │ │ └───┘ └───┘ │
│ ┌───┐ ┌───┐ │ │ ┌───┐ ┌───┐ │
│ │DB │ │Cache│ │ │ │DB │ │Cache│ │
│ └───┘ └───┘ │ │ └───┘ └───┘ │
└─────────────────┘ └─────────────────┘
↑
Traffic (live) (Ready, but idle)
The Switch
Before: All traffic → Blue (current)
Green (new) is ready, tested
Switch: Change load balancer
All traffic → Green (new)
Blue (current) is now idle
Time to switch: Seconds!
The Complete Flow
1. BLUE serves all traffic (current version)
Users: → Blue
2. Deploy new version to GREEN
Users: → Blue
GREEN gets the new version, tested internally
3. Test GREEN thoroughly
Smoke tests, integration tests
Make sure it works!
4. SWITCH traffic to GREEN
Users: → Green
Done in seconds
5. Monitor GREEN
Watch metrics, errors, performance
6. Problem? SWITCH back to BLUE
Instant rollback!
Users: → Blue (unaffected)
7. All good? BLUE becomes next staging
Deploy the next version to Blue when ready...
Why Blue-Green?
Instant Rollback
Rolling deployment rollback:
Gradually roll back each instance
Takes time
Blue-green rollback:
Switch traffic back to blue
Done in seconds!
Zero Downtime
No overlap period of mixed versions
Switch is atomic: all traffic moves at once
Users don't experience half-updated state
Full Testing Before Release
Green is 100% production-identical
Test with real configuration
Smoke test with real traffic (optionally)
Confidence before the switch
Blue-Green vs Other Strategies
| Strategy | Switch Time | Rollback | Resources | Testing |
|---|---|---|---|---|
| Blue-Green | Instant | Instant | 2x | Pre-production |
| Rolling | Gradual | Gradual | 1x + surge | During rollout |
| Canary | Gradual | Fast | 1x + canary | Percentage |
| Recreate | Minutes | Slow | 1x | Pre-production |
When to Use Blue-Green
✓ Need instant rollback capability
✓ Can afford double infrastructure
✓ Prefer atomic switches
✓ Testing in production-identical environment
When NOT to Use
✗ Resource-constrained (2x expensive!)
✗ Database schema changes (both envs share DB)
✗ Want gradual rollout with metrics
Database Considerations
The Challenge
Blue uses database schema v1
Green uses database schema v2
Both pointing to same database?
Schema change breaks Blue!
Solutions
Option 1: Backward-compatible migrations (when possible)
Add columns, don't remove
Old code ignores new columns
New code handles missing columns
Option 2: Separate databases
Copy data to green's database
More complex, more storage
Option 3: Feature flags
New code handles old schema
Switch flag after migration complete
Implementation
Load Balancer Switch
Load balancer points to environment:
Before:
LB → Blue pool (instances A, B, ...)
After switch:
LB → Green pool (instances C, D, ...)
All traffic moves instantly.
DNS Switch
DNS points to environment:
Before:
app.example.com → Blue IP
After switch:
app.example.com → Green IP
⚠️ DNS propagation can take time!
Load balancer switch is faster.
With Kubernetes
Two deployments:
app-blue, app-green
Service selector switch:
Before: selector: version=blue
After: selector: version=green
Traffic shifts immediately.
Verification Steps
Before Switching
1. Deploy to green
2. Run smoke tests on green
3. Compare green metrics to blue
4. Manual spot check of green
5. Get approval if needed
6. Switch!
After Switching
1. Monitor error rates
2. Monitor latency
3. Watch business metrics
4. Check logs for anomalies
5. Be ready to switch back!
Common Mistakes
1. Forgetting Database Compatibility
Schema migration breaks blue.
Now you CAN'T roll back!
Try to keep changes backward-compatible during the cutover.
2. Not Testing Green Thoroughly
"It worked in staging..."
Green IS production. Test it fully.
3. Deleting Blue Too Soon
"Green is working, delete blue"
Problem found later...
No rollback available!
Keep blue for at least a day.
4. Session Stickiness Issues
User session in blue.
Switch to green.
Session lost!
Use shared session store (Redis).
Cost Considerations
Running duplicate infrastructure = 2x cost
Ways to reduce:
- Keep idle environment smaller
- Spin up green just for deploys
- Use spot/preemptible instances for green
Trade-off: cost vs rollback speed
FAQ
Q: Blue-green vs canary?
Blue-green: all or nothing, instant Canary: gradual percentage, measured
Canary finds problems earlier (less blast radius). Blue-green has instant rollback.
Q: Which color is production?
Whichever has traffic! Colors alternate.
Q: How long to keep old environment?
At least until confidence in new version. Hours to days.
Q: What about background jobs?
Need to handle: drain jobs before switch, or run in both environments.
Summary
Blue-green deployment maintains two identical environments, switching all traffic instantly for zero-downtime releases.
Key Takeaways:
- Two environments: one live, one prepared
- Switch traffic with load balancer or DNS
- Instant rollback if problems
- Requires 2x infrastructure
- Database changes need careful planning
- Test green thoroughly before switch
- Keep blue available for rollback
Blue-green can give you very fast rollback.
Related Concepts
Leave a Comment
Comments (0)
Be the first to comment on this concept.
Comments are approved automatically.