- rtshkmr's digital garden/
- References/
- Architecture Design Basics/
- Pattern Taxonomy/
- Observability/
- The Three Pillars/
The Three Pillars
··
200 words·
1 min
Table of Contents
🟠P1 — the foundational framework for observability
The Pillars #
Logs #
- What: Discrete events with context (timestamp, service, message, metadata)
- When to use: Debugging specific requests, audit trails, error investigation
- Pattern: Structured logging (JSON) with correlation IDs; centralised aggregation (ELK, Datadog)
- Cost trap: Logging everything is expensive. Use sampling for high-volume paths.
Metrics #
- What: Numeric measurements aggregated over time (counters, gauges, histograms)
- When to use: Dashboards, alerting, capacity planning, SLO tracking
- Pattern: Pull-based (Prometheus scrapes) or push-based (StatsD, CloudWatch). Pre-aggregated = cheap to store.
- Key metrics: request rate, error rate, latency percentiles (p50, p95, p99)
Traces #
- What: The path of a single request across multiple services, with timing for each span
- When to use: Diagnosing latency in multi-service chains, understanding call graphs
- Pattern: Trace context propagation (W3C Trace Context), sampling, span collection
Instinct #
Metrics for detection (what’s broken), logs for investigation (why it’s broken), traces for diagnosis (where in the chain it’s broken).
All three are necessary; none is sufficient alone.
In design interviews, mention observability as a cross-cutting concern alongside auth and rate limiting.
References #
- Logs vs Structured Events — Charity Majors
- OpenTelemetry Documentation
- Metrics, Tracing, and Logging — Peter Bourgon; original “three pillars” framing