The Train Car Analogy
Upgrading a running train:
Stop and replace: Stop the entire train, replace all cars at once. Passengers stuck waiting!
Rolling replacement: Replace one car at a time while train runs. Passengers barely notice.
Rolling deployment updates your application gradually - replacing old instances with new ones, one (or a few) at a time.
How Rolling Deployment Works
The Process
Start: several instances of v1
[v1] [v1] [v1] [...] ← Serving traffic
Step: Replace one instance
[v2] [v1] [v1] [...] ← v2 starts, old instance is removed
Step: Wait for health checks, then repeat
[v2] [v2] [v1] [...] ← Rolling...
End: all instances are v2
[v2] [v2] [v2] [...] ← Complete
If configured well, users often don’t notice the update.
Key Parameters
maxUnavailable: How many can be down during update
"Typically keep most instances running"
maxSurge: Extra instances during update
"Can temporarily run extra instances"
Example idea:
Keep most instances available
Replace a small number at a time
Allow a small temporary surge if you have capacity
Why Rolling Deployments?
Reduced Downtime
Usually some instances stay running:
Users usually don't see error pages
Business continues uninterrupted
Gradual Rollout
Problems detected early:
If v2 crashes, it often affects only a small portion of capacity
Stop the rollout, fix the bug
Better than entire system crashing!
Resource Efficient
Unlike blue-green (needs double resources):
Rolling uses similar resources
Small temporary surge possible
Rolling vs Other Strategies
| Strategy | Downtime | Resources | Rollback | Risk |
|---|---|---|---|---|
| Rolling | Low | Normal (+surge) | Slower | Medium |
| Blue-Green | Low | Double | Instant | Low |
| Canary | Low | Slight extra | Instant | Lowest |
| Recreate | Yes | Normal | Manual | Highest |
When to Use Rolling
✓ Resource-constrained environments
✓ Stateless applications
✓ Moderate risk tolerance
✓ Default for Kubernetes
When NOT to Use Rolling
✗ Need instant rollback
✗ Breaking database changes
✗ May need to test with all traffic at once
✗ Incompatible versions can't coexist
Health Checks Are Critical
The Problem Without Health Checks
v2 starts but has a bug:
[v2-failing] [v1] [v1] [v1]
System thinks v2 is healthy, keeps rolling:
[v2-failing] [v2-failing] [v2-failing] [v2-failing]
Service can be disrupted.
With Health Checks
Readiness probe:
"Is this instance ready to receive traffic?"
v2 starts, probe fails:
[v2-failing] [v1] [v1] [v1]
Rolling can stop. Often it's limited to one problematic instance.
Investigate, fix, retry.
Types of Health Checks
Readiness: Can this instance serve traffic?
Failed? Remove from load balancer.
Liveness: Is this instance alive?
Failed? Restart it.
Startup: Is this instance still starting?
Gives slow starters time before checks begin.
Handling Database Changes
The Challenge
v1 expects: users table with columns A, B
v2 expects: users table with columns A, B, C
During rolling update:
Some instances are v1, some are v2
Database has to work for BOTH
The Solution: Backward Compatible Changes
Deploy sequence:
- Add column C (nullable) to database
- Deploy v2 (uses C if present)
- Wait for all instances to be v2
- Make C required
- Remove code that handles missing C
Try not to break the old version during the rollout.
Rollback Strategies
Automatic Rollback
Monitor for:
- Error rate spike
- Response time increase
- Health check failures
If threshold exceeded:
Stop rolling
Roll back to previous version
Manual Rollback
kubectl rollout undo deployment/my-app
Reverses the rolling process:
[v2] [v2] [v2] [v2]
↓
[v1] [v2] [v2] [v2]
↓
[v1] [v1] [v2] [v2]
↓
[v1] [v1] [v1] [v1]
Practical Tips
Proper Health Checks
Test actual functionality:
- Database connection
- Cache connection
- Critical dependencies
Not just "process is running"
Graceful Shutdown
When terminating old instance:
- Stop accepting new requests
- Finish in-progress requests
- Close connections cleanly
- Exit
Helps reduce dropped requests.
Monitor During Rollout
Watch metrics:
- Error rates
- Response times
- CPU/Memory usage
Be ready to pause or rollback.
Start Slow
Replace one instance, wait, observe.
If good, continue.
If bad, fix before more damage.
Common Mistakes
No Health Checks
Unhealthy instances can look healthy.
Rolling continues with a failing version.
Too Fast Rollout
All instances replaced before problems detected.
No benefit of gradual rollout.
Incompatible Versions
v1 and v2 can't talk to same database.
Split-brain during transition.
Solution: Aim for backward compatibility.
Ignoring Startup Time
New instances can take a while to start.
Health checks start too soon.
Instance killed before it's ready!
Set appropriate startup delays.
FAQ
Q: How slow should rolling be?
Depends on your monitoring. Slow enough to detect issues before all instances updated.
Q: What if old and new versions are incompatible?
Rolling deployment may not be a good fit. Use blue-green, or address compatibility first.
Q: Rolling deployment for stateful apps?
More complex. Consider StatefulSets with ordered updates.
Q: How do I test rolling deployments?
Staging environment. Practice rollback too!
Summary
Rolling deployment gradually replaces old instances with new ones to reduce downtime during updates.
Key Takeaways:
- Replace instances one at a time
- Reduced downtime when configured well
- Health checks are essential
- Ensure version compatibility
- Monitor and be ready to rollback
- Graceful shutdown prevents dropped requests
- Default strategy in Kubernetes
Rolling deployment balances safety and simplicity!
Related Concepts
Leave a Comment
Comments (0)
Be the first to comment on this concept.
Comments are approved automatically.