Skip to main content

🔄 Rolling Deployment

Gradual updates one server at a time

The Train Car Analogy

Upgrading a running train:

Stop and replace: Stop the entire train, replace all cars at once. Passengers stuck waiting!

Rolling replacement: Replace one car at a time while train runs. Passengers barely notice.

Rolling deployment updates your application gradually - replacing old instances with new ones, one (or a few) at a time.


How Rolling Deployment Works

The Process

Start: several instances of v1
  [v1] [v1] [v1] [...]   ← Serving traffic

Step: Replace one instance
  [v2] [v1] [v1] [...]   ← v2 starts, old instance is removed

Step: Wait for health checks, then repeat
  [v2] [v2] [v1] [...]   ← Rolling...

End: all instances are v2
  [v2] [v2] [v2] [...]   ← Complete

If configured well, users often don’t notice the update.

Key Parameters

maxUnavailable: How many can be down during update
  "Typically keep most instances running"

maxSurge: Extra instances during update
  "Can temporarily run extra instances"

Example idea:
  Keep most instances available
  Replace a small number at a time
  Allow a small temporary surge if you have capacity

Why Rolling Deployments?

Reduced Downtime

Usually some instances stay running:
  Users usually don't see error pages
  Business continues uninterrupted

Gradual Rollout

Problems detected early:
  If v2 crashes, it often affects only a small portion of capacity
  Stop the rollout, fix the bug
  Better than entire system crashing!

Resource Efficient

Unlike blue-green (needs double resources):
  Rolling uses similar resources
  Small temporary surge possible

Rolling vs Other Strategies

StrategyDowntimeResourcesRollbackRisk
RollingLowNormal (+surge)SlowerMedium
Blue-GreenLowDoubleInstantLow
CanaryLowSlight extraInstantLowest
RecreateYesNormalManualHighest

When to Use Rolling

✓ Resource-constrained environments
✓ Stateless applications
✓ Moderate risk tolerance
✓ Default for Kubernetes

When NOT to Use Rolling

✗ Need instant rollback
✗ Breaking database changes
✗ May need to test with all traffic at once
✗ Incompatible versions can't coexist

Health Checks Are Critical

The Problem Without Health Checks

v2 starts but has a bug:
  [v2-failing] [v1] [v1] [v1]

System thinks v2 is healthy, keeps rolling:
  [v2-failing] [v2-failing] [v2-failing] [v2-failing]

Service can be disrupted.

With Health Checks

Readiness probe:
  "Is this instance ready to receive traffic?"

v2 starts, probe fails:
  [v2-failing] [v1] [v1] [v1]

Rolling can stop. Often it's limited to one problematic instance.
Investigate, fix, retry.

Types of Health Checks

Readiness: Can this instance serve traffic?
  Failed? Remove from load balancer.

Liveness: Is this instance alive?
  Failed? Restart it.

Startup: Is this instance still starting?
  Gives slow starters time before checks begin.

Handling Database Changes

The Challenge

v1 expects: users table with columns A, B
v2 expects: users table with columns A, B, C

During rolling update:
  Some instances are v1, some are v2
  Database has to work for BOTH

The Solution: Backward Compatible Changes

Deploy sequence:
 - Add column C (nullable) to database
 - Deploy v2 (uses C if present)
 - Wait for all instances to be v2
 - Make C required
 - Remove code that handles missing C

Try not to break the old version during the rollout.

Rollback Strategies

Automatic Rollback

Monitor for:
  - Error rate spike
  - Response time increase
  - Health check failures

If threshold exceeded:
  Stop rolling
  Roll back to previous version

Manual Rollback

kubectl rollout undo deployment/my-app

Reverses the rolling process:
  [v2] [v2] [v2] [v2]
  ↓
  [v1] [v2] [v2] [v2]
  ↓
  [v1] [v1] [v2] [v2]
  ↓
  [v1] [v1] [v1] [v1]

Practical Tips

Proper Health Checks

Test actual functionality:
  - Database connection
  - Cache connection
  - Critical dependencies

Not just "process is running"

Graceful Shutdown

When terminating old instance:
  - Stop accepting new requests
  - Finish in-progress requests
  - Close connections cleanly
  - Exit

Helps reduce dropped requests.

Monitor During Rollout

Watch metrics:
  - Error rates
  - Response times
  - CPU/Memory usage

Be ready to pause or rollback.

Start Slow

Replace one instance, wait, observe.
If good, continue.
If bad, fix before more damage.

Common Mistakes

No Health Checks

Unhealthy instances can look healthy.
Rolling continues with a failing version.

Too Fast Rollout

All instances replaced before problems detected.
No benefit of gradual rollout.

Incompatible Versions

v1 and v2 can't talk to same database.
Split-brain during transition.

Solution: Aim for backward compatibility.

Ignoring Startup Time

New instances can take a while to start.
Health checks start too soon.
Instance killed before it's ready!

Set appropriate startup delays.

FAQ

Q: How slow should rolling be?

Depends on your monitoring. Slow enough to detect issues before all instances updated.

Q: What if old and new versions are incompatible?

Rolling deployment may not be a good fit. Use blue-green, or address compatibility first.

Q: Rolling deployment for stateful apps?

More complex. Consider StatefulSets with ordered updates.

Q: How do I test rolling deployments?

Staging environment. Practice rollback too!


Summary

Rolling deployment gradually replaces old instances with new ones to reduce downtime during updates.

Key Takeaways:

  • Replace instances one at a time
  • Reduced downtime when configured well
  • Health checks are essential
  • Ensure version compatibility
  • Monitor and be ready to rollback
  • Graceful shutdown prevents dropped requests
  • Default strategy in Kubernetes

Rolling deployment balances safety and simplicity!

Leave a Comment

Comments (0)

Be the first to comment on this concept.

Comments are approved automatically.