Skip to main content

📝 Logging

Black box recorder for your app

The Black Box Analogy

Airplanes have black box recorders:

  • Record every engine reading
  • Record every pilot action
  • Record every system status

After a crash, investigators know EXACTLY what happened.

Application logging is your app's black box. When things go wrong, logs tell you what happened, when, and why.


Why Logging Matters

Without Logs

User: "The app crashed yesterday at 2pm"
You:  "Uh... I have no idea what happened" 😰

Boss:  "What caused the outage?"
You:   "We're... investigating" 😬

With Logs

User: "The app crashed yesterday at 2pm"
You:  "Let me check the logs..."

[YYYY-MM-DD HH:MM:SS] WARN  Database connection slow (noticeable delay)
[YYYY-MM-DD HH:MM:SS] ERROR Database connection failed
[YYYY-MM-DD HH:MM:SS] ERROR User 12345 request failed
[YYYY-MM-DD HH:MM:SS] FATAL Application shutting down
[Dec 24 14:00:01] WARN  Database connection slow (noticeable delay)
[Dec 24 14:00:05] ERROR Database connection failed
[Dec 24 14:00:05] ERROR User 12345 request failed
[Dec 24 14:00:06] FATAL Application shutting down

You: "Database connection failed. Root cause identified!" ✅

Log Levels

The Severity Hierarchy

FATAL   ▲ Most severe
ERROR   │
WARN    │
INFO    │
DEBUG   ▼ Most verbose

When to Use Each

LevelWhen to UseExample
DEBUGDetailed dev info"Variable X = 42"
INFONormal operations"User logged in"
WARNPotential issues"Disk 80% full"
ERRORFailures"Payment failed"
FATALApp can't continue"Out of memory"

Filtering by Level

Production:  Show INFO and above (hide DEBUG)
Debugging:   Show ALL levels

Set log level per environment.

What to Log

Good Log Messages

✓ Include context
  "[user=123] Order placed [order_id=456] [total=$100]"

✓ Include timestamps
  "[Dec 24 14:30:15] User logged in"

✓ Include identifiers
  "Request [id=abc123] completed quickly"

✓ Be specific
  "Payment declined: Card expired"

Bad Log Messages

✗ Too vague
  "Something happened"
  "Error occurred"

✗ Missing context
  "User logged in" (which user?!)

✗ Security risks
  "Password: secret123"
  "Credit card: 4242-4242-4242-4242"

Structured Logging

Old Way: Plain Text

Dec 24 14:30:15 ERROR Payment failed for order 123: Card declined

Hard to search, parse, and analyze.

New Way: Structured (JSON)

{
  "timestamp": "Dec 24 14:30:15Z",
  "level": "ERROR",
  "service": "payment",
  "order_id": "123",
  "user_id": "456",
  "error": "Card declined",
  "card_last_four": "4242"
}

Easy to search: "Show all payment errors for user 456" Easy to aggregate: "Count errors by type" Easy to alert: "Notify when error rate > 5%"


Log Aggregation

The Problem at Scale

20 servers running your app.
Each writing to local files.

"Find the error from user 123" → Check 20 servers?!

The Solution

All logs → Central System → One Search Interface

          ┌─────────┐
Server 1 ─┤         │
Server 2 ─┤ Central │─→ Search
Server 3 ─┤  Logs   │─→ Dashboards
Server N ─┤         │─→ Alerts
          └─────────┘
ToolType
ELK StackElasticsearch + Logstash + Kibana
DatadogCloud-based all-in-one
SplunkEnterprise log analysis
Grafana LokiPrometheus-like for logs
CloudWatchAWS native

Log Retention

How Long to Keep?

Debugging:    Last few days (high volume)
Operations:   Last few weeks to a few months
Compliance:   Years (legal requirements)
Security:     Often a year or more

Balance: storage cost vs investigation needs

Log Rotation

Logs grow fast!

Without rotation:
  Disk fills up → App crashes

With rotation:
  app.log → app.log.1 → app.log.2
  Delete oldest when disk gets full

Common Patterns

Request ID Tracing

Attach unique ID to each request:

[req_id=abc123] Processing order
[req_id=abc123] Checking inventory
[req_id=abc123] Processing payment
[req_id=abc123] Order complete

Now you can follow one request through all components!

Correlation Across Services

Request flows: API → Order Service → Payment Service

Include correlation ID in all services:
[corr_id=xyz789] [service=api] Request received
[corr_id=xyz789] [service=order] Creating order
[corr_id=xyz789] [service=payment] Charging card

Trace the full journey!

Performance Considerations

Logging Overhead

Every log call:
  - Format the message
  - Write to disk/network

Too much logging = performance impact.

Practical Tips

✓ Use appropriate log levels
  Don't DEBUG in production

✓ Async logging
  Don't block requests waiting for log writes

✓ Sample verbose logs
  Log 1% of DEBUG messages in production

Common Mistakes

1. Logging Too Much

Terabytes of useless logs:
  "Entering function X"
  "Exiting function X"

Slows app, costs storage, hard to find signal.

2. Logging Too Little

"An error occurred"

Which error? Where? For whom? When?
Useless for debugging!

3. Not Rotating Logs

Disk fills up → App crashes → Data lost
Configure rotation (especially in production).

4. Logging Sensitive Data

Avoid logging:
  - Passwords
  - Credit card numbers
  - Personal identification
  - API keys

Logs are often broadly accessible!

FAQ

Q: Logging vs Monitoring?

Logging: Individual events (text records) Monitoring: Aggregated metrics (numbers, trends)

Use both together!

Q: Where should logs go?

Dev: Console Production: Central log system (Datadog, ELK, etc.)

Q: How verbose in production?

INFO level typically. Enable DEBUG temporarily when investigating.

Q: Costs of log aggregation services?

Can be significant at scale! Budget for it.


Summary

Logging records application events for debugging, monitoring, and auditing.

Key Takeaways:

  • Log levels: DEBUG, INFO, WARN, ERROR, FATAL
  • Include context: who, what, when, why
  • Use structured logging (JSON) for searchability
  • Centralize logs in production
  • Avoid logging sensitive data
  • Rotate logs to prevent disk fill
  • Request IDs enable tracing across services

Good logging turns "I don't know what happened" into "Here's exactly what went wrong."

Related Concepts

Leave a Comment

Comments (0)

Be the first to comment on this concept.

Comments are approved automatically.