The Black Box Analogy
Airplanes have black box recorders:
- Record every engine reading
- Record every pilot action
- Record every system status
After a crash, investigators know EXACTLY what happened.
Application logging is your app's black box. When things go wrong, logs tell you what happened, when, and why.
Why Logging Matters
Without Logs
User: "The app crashed yesterday at 2pm"
You: "Uh... I have no idea what happened" 😰
Boss: "What caused the outage?"
You: "We're... investigating" 😬
With Logs
User: "The app crashed yesterday at 2pm"
You: "Let me check the logs..."
[YYYY-MM-DD HH:MM:SS] WARN Database connection slow (noticeable delay)
[YYYY-MM-DD HH:MM:SS] ERROR Database connection failed
[YYYY-MM-DD HH:MM:SS] ERROR User 12345 request failed
[YYYY-MM-DD HH:MM:SS] FATAL Application shutting down
[Dec 24 14:00:01] WARN Database connection slow (noticeable delay)
[Dec 24 14:00:05] ERROR Database connection failed
[Dec 24 14:00:05] ERROR User 12345 request failed
[Dec 24 14:00:06] FATAL Application shutting down
You: "Database connection failed. Root cause identified!" ✅
Log Levels
The Severity Hierarchy
FATAL ▲ Most severe
ERROR │
WARN │
INFO │
DEBUG ▼ Most verbose
When to Use Each
| Level | When to Use | Example |
|---|---|---|
| DEBUG | Detailed dev info | "Variable X = 42" |
| INFO | Normal operations | "User logged in" |
| WARN | Potential issues | "Disk 80% full" |
| ERROR | Failures | "Payment failed" |
| FATAL | App can't continue | "Out of memory" |
Filtering by Level
Production: Show INFO and above (hide DEBUG)
Debugging: Show ALL levels
Set log level per environment.
What to Log
Good Log Messages
✓ Include context
"[user=123] Order placed [order_id=456] [total=$100]"
✓ Include timestamps
"[Dec 24 14:30:15] User logged in"
✓ Include identifiers
"Request [id=abc123] completed quickly"
✓ Be specific
"Payment declined: Card expired"
Bad Log Messages
✗ Too vague
"Something happened"
"Error occurred"
✗ Missing context
"User logged in" (which user?!)
✗ Security risks
"Password: secret123"
"Credit card: 4242-4242-4242-4242"
Structured Logging
Old Way: Plain Text
Dec 24 14:30:15 ERROR Payment failed for order 123: Card declined
Hard to search, parse, and analyze.
New Way: Structured (JSON)
{
"timestamp": "Dec 24 14:30:15Z",
"level": "ERROR",
"service": "payment",
"order_id": "123",
"user_id": "456",
"error": "Card declined",
"card_last_four": "4242"
}
Easy to search: "Show all payment errors for user 456" Easy to aggregate: "Count errors by type" Easy to alert: "Notify when error rate > 5%"
Log Aggregation
The Problem at Scale
20 servers running your app.
Each writing to local files.
"Find the error from user 123" → Check 20 servers?!
The Solution
All logs → Central System → One Search Interface
┌─────────┐
Server 1 ─┤ │
Server 2 ─┤ Central │─→ Search
Server 3 ─┤ Logs │─→ Dashboards
Server N ─┤ │─→ Alerts
└─────────┘
Popular Tools
| Tool | Type |
|---|---|
| ELK Stack | Elasticsearch + Logstash + Kibana |
| Datadog | Cloud-based all-in-one |
| Splunk | Enterprise log analysis |
| Grafana Loki | Prometheus-like for logs |
| CloudWatch | AWS native |
Log Retention
How Long to Keep?
Debugging: Last few days (high volume)
Operations: Last few weeks to a few months
Compliance: Years (legal requirements)
Security: Often a year or more
Balance: storage cost vs investigation needs
Log Rotation
Logs grow fast!
Without rotation:
Disk fills up → App crashes
With rotation:
app.log → app.log.1 → app.log.2
Delete oldest when disk gets full
Common Patterns
Request ID Tracing
Attach unique ID to each request:
[req_id=abc123] Processing order
[req_id=abc123] Checking inventory
[req_id=abc123] Processing payment
[req_id=abc123] Order complete
Now you can follow one request through all components!
Correlation Across Services
Request flows: API → Order Service → Payment Service
Include correlation ID in all services:
[corr_id=xyz789] [service=api] Request received
[corr_id=xyz789] [service=order] Creating order
[corr_id=xyz789] [service=payment] Charging card
Trace the full journey!
Performance Considerations
Logging Overhead
Every log call:
- Format the message
- Write to disk/network
Too much logging = performance impact.
Practical Tips
✓ Use appropriate log levels
Don't DEBUG in production
✓ Async logging
Don't block requests waiting for log writes
✓ Sample verbose logs
Log 1% of DEBUG messages in production
Common Mistakes
1. Logging Too Much
Terabytes of useless logs:
"Entering function X"
"Exiting function X"
Slows app, costs storage, hard to find signal.
2. Logging Too Little
"An error occurred"
Which error? Where? For whom? When?
Useless for debugging!
3. Not Rotating Logs
Disk fills up → App crashes → Data lost
Configure rotation (especially in production).
4. Logging Sensitive Data
Avoid logging:
- Passwords
- Credit card numbers
- Personal identification
- API keys
Logs are often broadly accessible!
FAQ
Q: Logging vs Monitoring?
Logging: Individual events (text records) Monitoring: Aggregated metrics (numbers, trends)
Use both together!
Q: Where should logs go?
Dev: Console Production: Central log system (Datadog, ELK, etc.)
Q: How verbose in production?
INFO level typically. Enable DEBUG temporarily when investigating.
Q: Costs of log aggregation services?
Can be significant at scale! Budget for it.
Summary
Logging records application events for debugging, monitoring, and auditing.
Key Takeaways:
- Log levels: DEBUG, INFO, WARN, ERROR, FATAL
- Include context: who, what, when, why
- Use structured logging (JSON) for searchability
- Centralize logs in production
- Avoid logging sensitive data
- Rotate logs to prevent disk fill
- Request IDs enable tracing across services
Good logging turns "I don't know what happened" into "Here's exactly what went wrong."
Related Concepts
Leave a Comment
Comments (0)
Be the first to comment on this concept.
Comments are approved automatically.