2026-03-06
Linking Traces and Logs in Grafana
I wanted to achieve an effect similar to agent observability platforms like LangSmith and Langfuse using Grafana, with the core requirement being the ability to see LLM message content.
Ensuring Jaeger and Loki Work Properly
Jaeger: 
Loki: 
Configuring Trace to Logs for Jaeger Data Source
Log into Grafana, and in the Jaeger data source settings page, configure the Trace to logs feature:
(http://HOST_ADDR:3000/connections/datasources/edit/jaeger)
Normally, you configure Tags:
- Select a field from the Traces data source (here, Jaeger), e.g.,
service.name - Then select the corresponding field from the Logs data source (here, Loki), e.g.,
service_name - If these two values match, they can be correlated via a query.
https://grafana.com/docs/grafana/next/datasources/jaeger/#use-a-simple-configuration The tags you configure must be present in the spans attributes or resources for a trace to logs span link to appear.
But I encountered an issue: in the telemetry data emitted by the application, neither resource attributes nor span attributes contain the service.name field.
Strange. Then why did the name doubao-agent appear in the previous Jaeger panel?
After checking the raw Jaeger data, it became clear: the process object contains the service.name field.
"processes": {
"p1": {
"serviceName": "doubao-agent",
"tags": [
{
"key": "telemetry.auto.version",
"type": "string",
"value": "0.60b1"
},
{
"key": "telemetry.sdk.language",
"type": "string",
"value": "python"
},
{
"key": "telemetry.sdk.name",
"type": "string",
"value": "opentelemetry"
},
{
"key": "telemetry.sdk.version",
"type": "string",
"value": "1.39.1"
}
]
}
}The final solution was to hardcode a custom query:
{service_name="doubao-agent"} | trace_id="${__trace.traceId}"The sign of success is the appearance of a "Logs for this span" button: 
Clicking this button jumps to the corresponding Loki interface.
Another pitfall: the time range in the Loki query interface might be wrong. Click the 🕰 icon and set the time to "last 1 hour" (adjust depending on the actual time difference between log generation and trace generation).
The result after the jump looks like this: 
But this was not what I wanted — it's a jump, not displaying logs as a span within the trace.
Replacing Jaeger with Tempo
Kimi suggested replacing Jaeger with Tempo, claiming it can display log content as a span within the trace. The replacement did not go smoothly.
OpenClaw's Aha Moment
As the investigation progressed, a new question emerged: where should the LLM messages be written? In the traces' span attributes, or in logs?
Discussing with AI, I felt that writing to logs seemed more reasonable, because LLM messages can be very large, and such text content is suitable for logs.
I discovered that Grafana has an AI Assistant integrated with its knowledge base. For my questions, this assistant provided well-reasoned answers.
There isn’t a single “official” best practice in the sources, but they do describe the trade‑offs and how Grafana and OpenTelemetry users are approaching this.
1. What traces vs. logs are for (in LLM/agent apps)
For LLM / agent workloads, traces are emphasized as the primary signal to understand the sequence of events (which tools/agents ran, whether a vector DB was used instead of an LLM call, etc.). [LLM guide; Office Hours 29]
Traces: “most prominent” signal for LLM apps, especially RAG and orchestration frameworks like LangChain, because they show the flow and which calls happened. [Office Hours 29]
Metrics: for aggregates (latency, cost, tokens, request volume). [LLM guide]
Logs: for detailed, often human‑oriented records and evaluation results, correlated to traces via trace/span IDs. [K8s logging call; Game of Traces]
2. What the OTel / Grafana folks recommend for LLM messages
The OpenTelemetry generative‑AI work (described in the community call) explicitly addresses where to put prompts/responses and similar content:
For early‑stage / baby applications, they suggest recording LLM input/output as span attributes (using the gen_ai.* semantic conventions). [GenAI community call]
At the same time, they acknowledge:
Text can be stored as span attributes, but binary or large content (images, audio, big messages) is problematic for traces. [GenAI community call]
There are cost and privacy concerns with storing full conversations in the telemetry backend; they propose a pattern where you:
keep a small, structured representation in span attributes, and
optionally upload full content to cold storage (e.g., S3) and store only a reference in telemetry. [GenAI community call]
Grafana Assistant itself follows a similar split: traces/logs/metrics in the observability stack, but conversations and trajectories are stored separately and linked, not fully in Tempo. [GenAI community call]
Also, in practice:
Many users capture prompts/responses directly in traces (as attributes) so they appear in the trace view and show the sequence of LLM/tool calls. [Office Hours 29]
Logs are recommended for individual evaluation results with reasoning, correlated to the trace via IDs. [GenAI community call]
3. How this maps to your options
Given your goal (“see agent thinking / LLM interaction steps directly in the trace view”), the sources point to a hybrid that is closer to Option B, with guardrails:
Use traces as the primary place to model the agent steps, and put enough message data on spans to understand the flow.
Concretely:
Model each LLM/tool/agent step as its own span (using the OTel gen‑AI semantic conventions), so the trace tree itself shows the orchestration. [LLM guide; GenAI community call]
On those spans, store:
Request/response metadata (model, temperature, top_p, token counts, cost, etc.). [LLM guide]
A truncated or redacted version of prompts/responses/reasoning that’s enough to debug and understand the step. [GenAI community call]
For full, verbose content:
Either store it in separate storage (or an LLM‑specific store) and keep a reference (URL, ID) on the span, as suggested in the OTel GenAI work. [GenAI community call]
Or, if you prefer Loki, emit logs with the full messages, correlated via trace/span IDs, and link traces↔logs in Grafana (Tempo+Loki correlation). [Game of Traces; Tempo announcement]
This gives you:
In Tempo’s trace view: the agent flow plus key prompt/response snippets and metadata (your main goal).
In Loki (or external storage): full conversations, reasoning, and eval logs when you need deep forensics or offline analysis.
4. Summary vs. your options
Option B (all messages in traces)
Matches the recommended starting point for small/early systems, but becomes expensive and risky (privacy, volume) if you store full conversations unbounded. [GenAI community call]
Option A (only logs, traces just link)
You lose the “LangSmith‑like” experience where the trace itself shows the agent’s thinking; the sources emphasize traces as the primary signal for LLM workflows. [Office Hours 29]
Best‑fit based on the docs:
Use traces as the canonical view of agent steps with summarized messages (Option B‑ish), and logs or external storage for full content and evals (Option A‑ish), all correlated via OpenTelemetry IDs. This is exactly the pattern the OTel GenAI work and Grafana teams describe. [GenAI community call]For early-stage agent projects, LLM messages can be written into span attributes. But later, you may face problems like increasing message size causing storage space issues, network consumption, and privacy/security concerns.
How Are Others Doing This?
To further confirm the implementation approach, I felt it necessary to investigate how others are handling this. The method is simple: clone the instrumentation libraries locally and see where LLM messages are written.
I assigned this task to my Xiao Zai (agent):
I have a task for you: investigate how current Agent observability platforms implement the feature of viewing the agent's thought content during runtime. I have two hypotheses: Hypothesis 1 is that LLM messages are written directly into trace attributes. Hypothesis 2 is that they are written into logs and later correlated with traces.
1. First, search for current agent observability platforms.
2. Download their instrumentation SDKs locally.
3. Use `uv tool run kimi-cli` (if bash is not available, switch to fish) to invoke the Kimi Code agent, and assign tasks to it to check the specific implementation of these instrumentation libraries.
4. Generate a report and write it to Feishu.
5. Create a 1-minute voice summary.After about ten minutes, Xiao Zai returned an error code. I expected that this task might be complex, and it's normal for Xiao Zai to encounter issues during execution. 
To my surprise, the analysis was completed, and a 1-minute voice summary was also generated.
This was my third Aha moment. The first was using ChatGPT to modify frontend code, and the second was achieving a milestone with Cursor (Claude) by rewriting an OTel Collector in Rust.
The Todo App Is AI's Killer App
Since my first job in 2023, I've been managing tasks. I built a task management system using WPS spreadsheets, which tracks task status, deadlines, and more, and even supports fancy kanban boards.
Gradually, I realized that simply operating the task management software already consumed a noticeable amount of energy and time.
After witnessing the capabilities of AI, I thought AI could solve this problem.
The ultimate form of a Todo app I envision is one where AI can automatically operate the task management software based on context and environmental awareness, without excessive user interaction.
In 2026, I asked Xiao Zai to create a task management spreadsheet, populating it with the tasks completed today and those to be done tomorrow. He finished quickly, and the integration with Feishu was exceptionally smooth.