Skip to main content
  1. References/
  2. Architecture Design Basics/
  3. Pattern Taxonomy/
  4. Observability/

The Three Pillars

·· 200 words· 1 min

🟠 P1 — the foundational framework for observability

The Pillars #

Logs #

  • What: Discrete events with context (timestamp, service, message, metadata)
  • When to use: Debugging specific requests, audit trails, error investigation
  • Pattern: Structured logging (JSON) with correlation IDs; centralised aggregation (ELK, Datadog)
  • Cost trap: Logging everything is expensive. Use sampling for high-volume paths.

Metrics #

  • What: Numeric measurements aggregated over time (counters, gauges, histograms)
  • When to use: Dashboards, alerting, capacity planning, SLO tracking
  • Pattern: Pull-based (Prometheus scrapes) or push-based (StatsD, CloudWatch). Pre-aggregated = cheap to store.
  • Key metrics: request rate, error rate, latency percentiles (p50, p95, p99)

Traces #

  • What: The path of a single request across multiple services, with timing for each span
  • When to use: Diagnosing latency in multi-service chains, understanding call graphs
  • Pattern: Trace context propagation (W3C Trace Context), sampling, span collection

Instinct #

Metrics for detection (what’s broken), logs for investigation (why it’s broken), traces for diagnosis (where in the chain it’s broken).

All three are necessary; none is sufficient alone.

In design interviews, mention observability as a cross-cutting concern alongside auth and rate limiting.

References #