Skip to main content

🕸️ Service Mesh

Traffic control for microservices

The Traffic Control Analogy

City traffic without control:

  • Cars go wherever, no rules
  • Accidents everywhere
  • No way to reroute during construction

City traffic with control:

  • Traffic lights manage flow
  • Speed limits for safety
  • Alternative routes when needed
  • Cameras monitor everything

Service mesh is traffic control for microservices. It manages how services communicate, adding security, observability, and control.


Why Service Mesh?

The Microservices Problem

10 microservices that talk to each other:

Service A ─→ Service B ─→ Service C
    ↓           ↓           ↓
Service D ─→ Service E ─→ Service F
    ↓           ↓           ↓
Service G ─→ Service H ─→ Service I

Questions:
  - Is Service E slow? For who?
  - Can Service A trust Service B?
  - What if Service C is overloaded?
  - How do I test a new version of Service E?

Each service handles this logic? Chaos!

Service Mesh Solves This

Move cross-cutting concerns OUT of services:

✓ Encryption between services (mTLS)
✓ Traffic management (routing, retries)
✓ Observability (metrics, tracing)
✓ Policy enforcement (who can call who)

Services can focus more on business logic.

How It Works

The Sidecar Pattern

Without mesh:
  ┌─────────┐     ┌─────────┐
  │Service A│────→│Service B│
  └─────────┘     └─────────┘

With mesh:
  ┌─────────┐     ┌─────────┐
  │Service A│     │Service B│
  │ ┌─────┐ │     │ ┌─────┐ │
  │ │Proxy│─┼────→┼─│Proxy│ │
  │ └─────┘ │     │ └─────┘ │
  └─────────┘     └─────────┘

Every service has a sidecar proxy.
All traffic goes through the proxy.

The Mesh Architecture

┌─────────────────────────────────────────┐
│           Control Plane                  │
│  (Configuration, Policies, Certs)        │
└────────────────┬────────────────────────┘
                 │ Configures
    ┌────────────┼────────────┐
    │            │            │
    ▼            ▼            ▼
┌────────┐  ┌────────┐  ┌────────┐
│Svc A   │  │Svc B   │  │Svc C   │
│┌──────┐│  │┌──────┐│  │┌──────┐│
││Proxy ││←→││Proxy ││←→││Proxy ││
│└──────┘│  │└──────┘│  │└──────┘│
└────────┘  └────────┘  └────────┘
              Data Plane

Control Plane: Brains (policies, config)
Data Plane: Muscle (proxies moving traffic)

Key Features

1. Mutual TLS (mTLS)

Without mesh:
  Services trust each other blindly
  Network attacker can intercept traffic

With mesh:
  Every connection is encrypted
  Every service proves its identity

Zero-trust networking, automatically.

2. Traffic Management

Canary deployments:
  Send 5% of traffic to new version

A/B testing:
  Route by header, user, etc.

Timeouts & retries:
  Retry failed requests automatically

Circuit breaking:
  Stop sending to failing service

3. Observability

Metrics:
  Request rate, error rate, latency
  For EVERY service-to-service call

Distributed tracing:
  Follow a request across all services
  Service A → B → C → D (see full path)

Logs:
  Standardized logging across all services

4. Policy Enforcement

Authorization:
  Service A can call Service B
  Service C cannot call Service D

Rate limiting:
  Max 100 requests/second to Service E

Header manipulation:
  Add, remove, modify headers

MeshProxyGood Fit
IstioEnvoyFeature-rich
Linkerdlinkerd2-proxySimplicity
Consul ConnectEnvoyHashiCorp stack
AWS App MeshEnvoyAWS environments

Istio

Most popular, most features
Steeper learning curve
Built on Envoy proxy

Linkerd

Simpler to operate
Lighter weight
Easier to get started

When to Use a Service Mesh

Good Fit

✓ Many microservices (10+)
✓ Multiple teams developing services
✓ Need mTLS without code changes
✓ Complex traffic management needs
✓ Observability gaps

Not Needed

✗ Monolith application
✗ Few services (<5)
✗ Simple communication patterns
✗ Team learning Kubernetes still

Common Patterns

Retries with Backoff

Request fails → Wait a bit → Retry
Retry fails → Wait longer → Retry
Retry fails → Wait even longer → Retry
Max retries → Give up, return error

Configured in mesh, not in code!

Circuit Breaker

Normal:     All requests go through
Too many errors:
  Circuit OPENS
  Requests fail fast (don't even try)
After cooldown:
  Circuit CLOSES
  Try again

Prevents cascading failures.

Traffic Splitting

90% → v1 (stable)
10% → v2 (canary)

Gradually shift:
  80% → v1
  20% → v2
  ...
  0% → v1
  100% → v2

The Cost

Overhead

Every request goes through proxy:
  - Extra latency (small, but exists)
  - Extra memory (sidecar per pod)
  - Extra CPU

For most apps: acceptable
For ultra-low-latency: consider carefully

Complexity

Another system to:
  - Learn
  - Configure
  - Debug
  - Monitor
  - Upgrade

Is the value worth the cost?

Common Mistakes

1. Mesh for 3 Services

Overkill. Simple load balancer works.

2. Ignoring the Learning Curve

Istio is complex.
Budget time to learn and train team.

3. Not Using Observability

Mesh provides metrics, traces, logs.
If you don't use them, why add the overhead?

FAQ

Q: Service mesh vs API gateway?

API Gateway: north-south traffic (external → cluster) Service Mesh: east-west traffic (service → service)

Often used together!

Q: Do I need Kubernetes?

Not technically required, but meshes often pair well with Kubernetes.

Q: Will it slow down my app?

Slightly (proxy overhead). Often small per hop, but measure to be sure.

Q: Istio vs Linkerd?

Istio: more features, more complex Linkerd: simpler, easier to operate


Summary

Service mesh manages service-to-service communication, providing security, observability, and traffic control.

Key Takeaways:

  • Sidecar proxy intercepts all traffic
  • mTLS for automatic encryption
  • Traffic management: canary, retries, circuit breaking
  • Observability: metrics, tracing, logs
  • Control plane manages, data plane executes
  • Useful for many microservices
  • Has overhead and complexity cost

Service mesh untangles the microservices communication mess!

Leave a Comment

Comments (0)

Be the first to comment on this concept.

Comments are approved automatically.