↓Skip to main content

The Three Pillars

·Updated: 16 June 2026· 200 words· 1 min

Reference Observability P1

Table of Contents

🟠 P1 — the foundational framework for observability

The Pillars #

Logs #

What: Discrete events with context (timestamp, service, message, metadata)
When to use: Debugging specific requests, audit trails, error investigation
Pattern: Structured logging (JSON) with correlation IDs; centralised aggregation (ELK, Datadog)
Cost trap: Logging everything is expensive. Use sampling for high-volume paths.

Metrics #

What: Numeric measurements aggregated over time (counters, gauges, histograms)
When to use: Dashboards, alerting, capacity planning, SLO tracking
Pattern: Pull-based (Prometheus scrapes) or push-based (StatsD, CloudWatch). Pre-aggregated = cheap to store.
Key metrics: request rate, error rate, latency percentiles (p50, p95, p99)

Traces #

What: The path of a single request across multiple services, with timing for each span
When to use: Diagnosing latency in multi-service chains, understanding call graphs
Pattern: Trace context propagation (W3C Trace Context), sampling, span collection

Instinct #

Metrics for detection (what’s broken), logs for investigation (why it’s broken), traces for diagnosis (where in the chain it’s broken).

All three are necessary; none is sufficient alone.

In design interviews, mention observability as a cross-cutting concern alongside auth and rate limiting.

References #

Logs vs Structured Events — Charity Majors
OpenTelemetry Documentation
Metrics, Tracing, and Logging — Peter Bourgon; original “three pillars” framing