Concepts
Basics
Observability means instrumenting the target system so that we can understand what's happening inside without knowing its internal structure, making it easier to observe system behavior and debug.
#TODO: Reliability, metrics, SLI, SLO - haven't touched on these yet.
Distributed tracing allows us to observe what happens across different services for a single request in a distributed architecture.
To understand distributed tracing, we need to understand the roles of different components:
- Logs: Timestamped messages emitted from services or other components
- Spans: A span represents a unit of work. Spans carry attributes called span attributes.
- Traces: Composed of multiple spans. The first span is the root span. Spans have parent-child relationships.
Many observability backends use waterfall diagrams to visualize traces: