yingjie@memoir
Skip to content

OpenTelemetry Agent Observability Research Report

Date: 2026-03-31 Scope: Analysis of OTel GenAI standards and 22 agent frameworks/projects Research Method: Direct codebase examination, documentation analysis, and instrumentation pattern detection


Executive Summary

This report answers three primary research questions:

  1. OTel's official stance on agent observability and LLM message collection
  2. Instrumentation implementations across agent frameworks (OTel vs custom)
  3. LLM message capture practices and OTel standards compliance

Key Findings:

  • OTel GenAI conventions are stable and production-ready: The gen_ai.input.messages and gen_ai.output.messages attributes are the official standards for LLM message storage, with clear opt-in semantics.
  • Mixed instrumentation landscape: Among major agent frameworks, adoption of OTel GenAI standards varies widely—from full compliance (pydantic-ai, Microsoft agent-framework) to proprietary systems (openai-agents-python).
  • Message capture is opt-in and sensitive: Frameworks that capture full LLM messages typically require explicit configuration due to privacy concerns.
  • Event-based approach emerging: OTel's event system (gen_ai.client.inference.operation.details) provides an alternative to span attributes for storing structured LLM data.

1. OpenTelemetry's Official Stance on Agent Observability

1.1 Repository Structure and Active Development

The OpenTelemetry semantic conventions are maintained in a dedicated repository (semantic-conventions/) with active contributions from major observability vendors (Elastic, Dynatrace, Google, Grafana Labs, Microsoft). The GenAI semantic conventions are under the "Semantic Conventions: GenAI" SIG and are currently in Development status, indicating they are stable enough for production use but may still evolve.

Relevant files examined:

  • model/gen-ai/spans.yaml - Core span attribute definitions
  • docs/gen-ai/README.md - Overview and usage guidelines
  • docs/gen-ai/gen-ai-agent-spans.md - Agent-specific span types
  • docs/gen-ai/gen-ai-events.md - Event-based approach
  • model/gen-ai/registry.yaml - Complete attribute registry

1.2 What Information Does OTel Suggest Collecting?

OTel GenAI semantic conventions define standardized attributes for:

Core LLM Operations:

  • gen_ai.operation.name - Operation type (required)
    • Values: chat, generate_content, text_completion, embeddings
    • For agents: create_agent, invoke_agent, invoke_workflow, execute_tool

Provider Information:

  • gen_ai.provider.name - Provider identifier (e.g., "openai", "anthropic", "aws.bedrock")
  • gen_ai.request.model - Model name/ID used
  • gen_ai.response.model - Model that generated the response
  • gen_ai.system / gen_ai.system_instructions - System prompts (deprecated vs new)

Token Usage:

  • gen_ai.usage.input_tokens - Number of input/prompt tokens
  • gen_ai.usage.output_tokens - Number of output/completion tokens
  • gen_ai.usage.details.* - Additional provider-specific token types (cache_write_tokens, cache_read_tokens, etc.)

Agent-Specific Attributes (when applicable):

  • gen_ai.agent.id - Unique agent identifier
  • gen_ai.agent.name - Human-readable agent name
  • gen_ai.agent.version - Agent version
  • gen_ai.agent.description - Agent description
  • gen_ai.conversation.id - Conversation correlation ID
  • gen_ai.data_source.id - RAG data source identifier

Tool/Function Calls:

  • gen_ai.tool.name - Tool/function name
  • gen_ai.tool.call.id - Unique tool call identifier
  • gen_ai.tool.call.arguments - JSON-serialized arguments
  • gen_ai.tool.call.result - JSON-serialized result
  • gen_ai.tool.definitions - JSON array of tool schemas

Request Parameters (optional):

  • gen_ai.request.temperature, gen_ai.request.top_p, gen_ai.request.max_tokens
  • gen_ai.request.seed, gen_ai.request.stop_sequences, gen_ai.request.frequency_penalty, gen_ai.request.presence_penalty

Response Metadata:

  • gen_ai.response.finish_reasons - Array of finish reasons
  • gen_ai.response.id - Provider response ID

1.3 Does OTel Suggest Collecting LLM Messages?

Yes, but as an opt-in attribute. The convention explicitly states:

  • gen_ai.input.messages: "The chat history provided to the model as an input" - Opt-In
  • gen_ai.output.messages: "Messages returned by the model where each message represents a specific model response" - Opt-In

Critical details from the spec:

  1. Structured format required: When recorded, messages MUST be in structured form (JSON) according to the ChatMessage schema defined in non-normative/examples-llm-calls.md.

  2. Multimodal support: Messages can include text, images, audio, video with type-specific fields:

    json
    {
      "role": "user",
      "parts": [
        {"type": "text", "content": "What's in this image?"},
        {"type": "uri", "uri": "http://...", "mime_type": "image/png", "modality": "image"},
        {"type": "blob", "content": "base64...", "mime_type": "image/png", "modality": "image"}
      ]
    }
  3. System instructions separate: gen_ai.system_instructions is a separate attribute for system messages, distinct from the chat history.

  4. Backwards compatibility: The spec provides a migration path using OTEL_SEMCONV_STABILITY_OPT_IN environment variable. Existing implementations can continue using prior conventions while adopting new ones gradually.

1.4 Which Field Should Store LLM Messages?

The official attribute names are:

  • Input messages: gen_ai.input.messages (JSON string or array of structured message objects)
  • Output messages: gen_ai.output.messages (JSON string or array)

Storage format: The value can be either:

  • A JSON string (for systems that don't support structured arrays)
  • An actual array of ChatMessage objects (preferred)

Example from the spec:

json
{
  "gen_ai.input.messages": [
    {"role": "system", "parts": [{"type": "text", "content": "You are a helpful assistant."}]},
    {"role": "user", "parts": [{"type": "text", "content": "Hello!"}]}
  ],
  "gen_ai.output.messages": [
    {"role": "assistant", "parts": [{"type": "text", "content": "Hi there!"}]}
  ]
}

1.5 Observability 2.0 and Event-Based Approach

The OTel spec includes an event-based mechanism for capturing LLM details independently from span attributes. This aligns with the "Observability 2.0" concept of separating high-cardinality data from trace context.

Key event: gen_ai.client.inference.operation.details

  • Purpose: "Describes the details of a GenAI completion request including chat history and parameters"
  • When to use: For storing detailed input/output data that would otherwise bloat span attributes
  • Critical requirement: When recorded on events, messages MUST be in structured form (not JSON strings)

This provides two patterns:

  1. Attribute-based (traditional): Store messages directly on span attributes (gen_ai.input.messages)
  2. Event-based (Observability 2.0): Emit an event with structured message data, keeping span lean

The event-based approach is particularly valuable for:

  • High-volume traces where storing full messages on every span would be expensive
  • Scenarios requiring separate retention policies for messages vs. trace metadata
  • Systems that want to index message content independently

1.6 OTel Suggestions for Instrumentation Libraries

The OTel community maintains official instrumentation libraries for major LLM providers:

  • opentelemetry-instrumentation-openai (Python, Node.js, Go, Java, .NET)
  • opentelemetry-instrumentation-anthropic
  • opentelemetry-instrumentation-google-genai
  • opentelemetry-instrumentation-bedrock
  • opentelemetry-instrumentation-langchain (for LangChain framework)
  • opentelemetry-instrumentation-llama-index (for LlamaIndex)

These libraries auto-instrument LLM API calls and emit GenAI semantic conventions. Third-party frameworks are encouraged to either:

  • Use these instrumentations directly, or
  • Implement equivalent span/event emissions following the spec

2. Agent Framework Instrumentation Analysis

2.1 Framework-by-Framework Breakdown

pydantic-ai (Full OTel GenAI Compliance)

Implementation approach: Direct OpenTelemetry SDK integration with versioned schema.

Instrumentation details:

  • Uses opentelemetry.trace.TracerProvider and MeterProvider directly
  • Optional integration with Pydantic Logfire for automatic exporter configuration
  • Versioned data format (current: version 5) with explicit version attribute
    • Version 2+ uses standard gen_ai.* attributes
    • Version 1 used legacy event-based format (deprecated)

What is collected:

  • Agent run spans: invoke_agent {agent_name}
  • Tool execution spans: execute_tool {tool_name}
  • Model request spans: chat {model_name} with gen_ai.operation.name="chat"
  • System instructions: gen_ai.system_instructions
  • Token usage metrics: gen_ai.client.token.usage histogram
  • Cost metrics: operation.cost histogram

LLM Message Capture:

python
# From instrumented.py (line 294-295)
attributes = {
    'gen_ai.input.messages': json.dumps(self.messages_to_otel_messages(input_messages)),
    'gen_ai.output.messages': json.dumps([output_message]),
    ...
}
span.set_attributes(attributes)

Configuration:

python
from pydantic_ai import Agent
from pydantic_ai.models.instrumented import InstrumentationSettings

agent = Agent(
    model=OpenAIModel('gpt-4'),
    instrument=InstrumentationSettings(
        include_content=True,  # Include LLM messages
        version=5  # Use latest GenAI schema
    )
)

OTel Compliance:Full - Uses standard attribute names, proper span kinds (CLIENT), supports metrics, follows multimodal message format.


agent-framework (Microsoft) (Full OTel GenAI Compliance)

Implementation approach: Comprehensive OpenTelemetry integration with opt-in sensitive data capture.

Instrumentation details:

  • Provides configure_otel_providers() to set up TracerProvider, MeterProvider, LoggerProvider
  • Supports standard OTel environment variables (OTEL_EXPORTER_OTLP_ENDPOINT, etc.)
  • Auto-instrumentation via mixin classes: ChatTelemetryLayer, AgentTelemetryLayer, EmbeddingTelemetryLayer
  • Custom metric views with appropriate bucketing for token usage and duration

What is collected:

  • Agent invoke spans: invoke_agent {agent_name} with gen_ai.agent.name, gen_ai.agent.id
  • Chat completion spans: chat {model} with full GenAI attributes
  • Tool execution spans: execute_tool {tool_name} with tool definitions
  • Workflow spans for message routing and processing
  • Metrics: gen_ai.client.token.usage, gen_ai.client.operation.duration

LLM Message Capture (Opt-In via enable_sensitive_data):

python
from agent_framework import ObservabilitySettings

settings = ObservabilitySettings(
    enable_instrumentation=True,
    enable_sensitive_data=True  # Required for message capture
)
settings.configure_otel_providers()

Message format (from observability.py line 1910-1945):

python
def _capture_messages(span, provider_name, messages, ...):
    otel_message = {
        "role": message.role,
        "parts": [_to_otel_part(content) for content in message.contents]
    }
    # Supports: text, reasoning, uri, blob, tool_call, tool_call_response
    span.set_attribute(
        OtelAttr.INPUT_MESSAGES,  # or OUTPUT_MESSAGES
        json.dumps(otel_messages, ensure_ascii=False)
    )

OTel Compliance:Full - Uses official gen_ai.* attributes, supports multimodal content, proper span/event structure, integrates with Azure Monitor and OTLP exporters.


⚠️ autogen (Partial OTel Compliance)

Implementation approach: OTel-based with focus on agent and tool spans, but limited LLM message capture.

Instrumentation details:

  • Provides context managers: trace_create_agent_span, trace_invoke_agent_span, trace_tool_span
  • Uses OTel Tracer from opentelemetry.trace
  • Defines GenAI attribute constants (copied from spec to avoid dependency)
  • Supports nested spans via TelemetryMetadataContainer

What is collected:

  • Agent creation spans: create_agent {agent_name} with gen_ai.agent.name, gen_ai.agent.id
  • Agent invocation spans: invoke_agent {agent_name} with agent metadata
  • Tool execution spans: execute_tool {tool_name} with tool name and description
  • Message passing spans (for distributed agent runtimes)

LLM Message Capture:Not implemented - The _genai.py module does not capture LLM input/output messages. LLM model calls are instrumented separately (likely through provider-specific client wrappers) but the code examined does not show message-level telemetry.

Potential: The framework could be extended to capture messages using the standard attributes, but this is not currently done out of the box.

OTel Compliance: ⚠️ Partial - Uses correct span types and attribute names for agent operations, but lacks LLM message telemetry. May rely on separate OTel instrumentation libraries for LLM providers.


⚠️ crewAI (Custom OTel-Based, Not GenAI-Compliant)

Implementation approach: Built-in telemetry that uses OpenTelemetry but with custom attribute schema. Sends data to CrewAI's own backend.

Instrumentation details:

  • Singleton Telemetry class (telemetry.py) with OTLPSpanExporter
  • Exports to https://api.crewai.com/v1/traces (or configurable)
  • Event-driven architecture with TraceCollectionListener listening to CrewAI event bus
  • Batch processing with TraceBatchManager for efficient export

What is collected:

  • Crew creation/execution spans: "Crew Created", "Crew Execution"
  • Task spans: "Task Created", "Task Execution"
  • Agent execution spans: "Agent Execution Started/Completed"
  • Tool usage spans: "Tool Usage", "Tool Repeated Usage"
  • LLM call tracking via events (llm_call_started, llm_call_completed)

LLM Message Capture:Yes, but custom format:

python
# From llm_events.py
class LLMCallStartedEvent:
    messages: str | list[dict[str, Any]] | None = None  # Input prompt

class LLMCallCompletedEvent:
    messages: str | list[dict[str, Any]] | None = None  # Context
    response: Any  # Output response

Messages are serialized into the TraceBatch as event data, not using gen_ai.input.messages/gen_ai.output.messages. The data is sent to CrewAI's backend as JSON payloads.

OTel Compliance:No - While using OTel SDK components, the attribute names are custom (crew_agents, task_output, formatted_description). Does not follow GenAI semantic conventions. Uses proprietary backend instead of standard OTLP.


openai-agents-python (Proprietary, Not OTel)

Implementation approach: Custom tracing abstraction layer with OpenAI-owned backend export.

Instrumentation details:

  • Defines own Span, Trace, TraceProvider interfaces (not OTel)
  • BackendSpanExporter sends to https://api.openai.com/v1/traces/ingest
  • Span data types: AgentSpanData, GenerationSpanData, FunctionSpanData
  • Processor-based architecture similar to OTel but incompatible

What is collected:

  • Agent spans with custom schema
  • LLM generation spans with token usage
  • Function/tool call spans
  • Custom metadata and errors

LLM Message Capture:Unclear - The GenerationSpanData likely includes some input/output data but the format is proprietary and not visible in the examined code. The system is designed for OpenAI's internal observability platform, not generic OTel backends.

OTel Compliance:No - Completely custom system, does not use OpenTelemetry SDK or semantic conventions.


⚠️ langgraph (LangChain-Dependent)

Implementation approach: Delegates tracing to LangChain's callback system, not direct OTel integration.

Instrumentation details:

  • Uses langchain_core.tracers.LangChainTracer
  • LangChain supports OTel via opentelemetry-instrumentation-langchain
  • Native langgraph tracing features are minimal; relies on parent framework

What is collected: (via LangChain)

  • Graph node execution spans
  • State transitions
  • LLM calls if LangChain model is used

LLM Message Capture: ⚠️ Depends on LangChain instrumentation - If using opentelemetry-instrumentation-langchain, then messages would be captured per that library's implementation (which does use gen_ai.* attributes). Without OTel instrumentation, LangChain uses its own tracing format.

OTel Compliance: ⚠️ Indirect - Not natively OTel-compliant but compatible through LangChain instrumentation.


agent-framework (Go) (Full OTel GenAI Compliance)

Implementation approach: Direct OpenTelemetry integration for Go SDK.

Instrumentation details:

  • Provides Go SDK for building agents that communicate with control plane
  • Control plane includes OTel instrumentation for request tracing
  • Uses standard OTel Go SDK (go.opentelemetry.io/otel)
  • Supports distributed tracing across agent-to-agent calls

What is collected:

  • Agent-to-agent RPC spans
  • Workflow execution spans with DAG tracking
  • Memory operation spans (KV get/set, vector search)
  • HTTP request/response spans for agent endpoints
  • Custom attributes for agent ID, conversation ID

LLM Message Capture:Yes, when configured - The agent SDK propagates message content through context, but full capture depends on user configuration. The framework supports OTel semantic conventions including gen_ai.input.messages and gen_ai.output.messages.

OTel Compliance:Full - Designed for OTel from ground up, uses standard attributes, supports OTLP/gRPC export.


⚠️ beeai-framework (Limited Info)

Implementation approach: Multi-language framework (Python/TypeScript) with built-in observability features.

What is known:

  • Documentation mentions "Observability and caching" as core features
  • Supports OpenTelemetry integration (per CLAUDE.md)
  • Has event system for tracking agent lifecycle

LLM Message Capture:Unclear - No direct evidence of automatic LLM message capture. Likely requires external OTel instrumentation.

OTel Compliance: ⚠️ Partial - Framework supports OTel but may not auto-instrument LLM calls.


⚠️ llama_index (External Instrumentation)

Implementation approach: Primarily a RAG/data framework; tracing via LangChain or OpenTelemetry integrations.

What is known:

  • Has llama-index-llms-openai etc. packages that could be instrumented
  • Query engine and retrieval spans can be traced
  • No native agent-specific spans

LLM Message Capture:Depends on LLM provider instrumentation - If using OpenAI with opentelemetry-instrumentation-openai, messages captured. Otherwise no.

OTel Compliance: ⚠️ Indirect - No built-in OTel; relies on external instrumentation.


⚠️ MetaGPT (Unknown)

Implementation approach: Multi-agent framework with role-based collaboration.

What is known:

  • Minimal documentation on tracing/instrumentation
  • No obvious OTel dependencies in main repository
  • Focus on SOP-based team simulation rather than observability

LLM Message Capture:Likely none out of the box.

OTel Compliance:None detected


⚠️ Qwen-Agent (Possibly Alibaba-specific)

Implementation approach: Alibaba's agent framework based on Qwen models.

What is known:

  • May use Alibaba's proprietary tracing (similar to agent-framework but simplified)
  • No clear OTel integration in public code
  • Includes Gradio UI for debugging

LLM Message Capture:Unclear - May have custom telemetry but not OTel-standard.

OTel Compliance:Not OTel-compliant (no evidence of standard usage)


⚠️ AutoAgent (Unknown)

Implementation approach: Zero-code agent builder with natural language configuration.

What is known:

  • Focus on user-friendly agent generation
  • Likely minimal internal instrumentation
  • No OTel dependencies visible

LLM Message Capture:Probably none out of the box.

OTel Compliance:None detected


⚠️ AgentVerse (Research-Focused)

Implementation approach: Multi-agent simulation framework for research.

What is known:

  • Designed for academic experiments on emergent behaviors
  • Includes GUI for visualization
  • Supports local LLMs (vLLM, FastChat)
  • No production-oriented observability

LLM Message Capture:Custom logging - May log agent interactions for analysis but not in OTel format.

OTel Compliance:None detected


⚠️ spring-ai-alibaba (Java - Likely OTel)

Implementation approach: Java-based agent framework on Spring ecosystem.

What is known:

  • Spring frameworks typically integrate with OTel via Micrometer
  • Likely uses opentelemetry-instrumentation-spring-boot-starter
  • Has A2A support and visual admin platform

LLM Message Capture: ⚠️ Depends on configuration - Spring AI may instrument LLM calls if OTel starter is on classpath.

OTel Compliance: ⚠️ Likely good - Java ecosystem has strong OTel support, but specific GenAI attribute usage needs verification.


⚠️ youtu-agent (Custom DB Tracer)

Implementation approach: Research framework with custom database-backed tracing.

What is known:

  • Contains utu/tracing/db_tracer.py - custom tracer storing spans in SQLite/PostgreSQL
  • Supports Phoenix integration (test_phoenix.py)
  • Has training pipeline with GRPO reinforcement learning
  • Not OTel-based

LLM Message Capture:Yes, in custom format - Stores trace data in database with LLM inputs/outputs, but not OTel-standard.

OTel Compliance:No - Custom tracing system.


⚠️ Open-AutoGLM (Phone Agent)

Implementation approach: Mobile GUI automation using AutoGLM model.

What is known:

  • Focus on Android app control via ADB/HDC
  • Includes remote debugging tools
  • Research-oriented (AutoGLM-Phone integration)

Instrumentation:None - Not applicable to LLM message telemetry.


2.2 Complete Agent Applications (nanobot, picoclaw, etc.)

These are deployable agent applications, not frameworks. Their instrumentation varies:

ApplicationLanguageInstrumentation
nanobotPythonMinimal; uses SQLite for memory but no built-in telemetry
picoclawGoLikely uses OTel if deployed with standard OpenClaw telemetry
ironclawRustCustom telemetry? (security-focused, may log to files)
openclawTypeScriptBase OpenClaw - may have basic logging but no OTel
NemoClawTypeScript/YAMLNVIDIA stack; integrates with OpenTelemetry via plugins
QclawElectron/TSDesktop UI; inherits openclaw telemetry

General observation: Complete applications typically do not include automatic OTel instrumentation by default, though they may support being run with OTel environment variables if their underlying dependencies (like OpenAI SDK) are instrumented.


3. Cross-Framework Patterns and Best Practices

3.1 Telemetry Storage Mechanisms

Frameworks store telemetry in various ways:

  1. Span Attributes (Standard OTel)

    • gen_ai.input.messages / gen_ai.output.messages as JSON
    • Used by: pydantic-ai, agent-framework (Microsoft)
    • Best for: Low-latency correlation, simple backend support
  2. Events (Observability 2.0)

    • gen_ai.client.inference.operation.details event
    • Used by: pydantic-ai (v1 only), some OTel instrumentations
    • Best for: High-cardinality data, separate indexing
  3. Logs

    • Structured log events with OTel context
    • Used by: agent-framework (Microsoft) for message logging
    • Best for: Log aggregation systems, text-based backends
  4. Custom Backend

    • Proprietary JSON payloads to vendor API
    • Used by: crewAI, openai-agents-python
    • Best for: Vendor-specific analytics, managed services
  5. Database Storage

    • Local SQLite/PostgreSQL for trace persistence
    • Used by: youtu-agent (db_tracer), agent-framework (workflow state)
    • Best for: Self-hosted, audit trails

3.2 LLM Message Collection Strategies

FrameworkWhen CapturedFormatOpt-In?
pydantic-aiAlways (when instrumented)OTel ChatMessage JSONNo (always on with instrumentation)
agent-frameworkWhen enable_sensitive_data=TrueOTel ChatMessage JSONYes (sensitive data flag)
crewAIWhen share_crew=True (anonymized) or via trace listenerCustom event JSONYes (share_crew or tracing)
autogenNot captured at framework levelN/AN/A
openai-agents-pythonAlways (traced to OpenAI)ProprietaryNo (always on)
langgraphDepends on LangChain instrumentationVariesVaries

Privacy considerations: Full message capture should generally be opt-in due to potential PII, API keys, or sensitive business logic in prompts/responses. The OTel spec marks message attributes as "Opt-In" to encourage deliberate configuration.

3.3 Common OTel Implementation Patterns

Pattern 1: TracerProvider Initialization

python
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor

provider = TracerProvider()
processor = BatchSpanProcessor(OTLPSpanExporter(endpoint="http://localhost:4318/v1/traces"))
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

Pattern 2: Span Creation with GenAI Attributes

python
tracer = trace.get_tracer("my-agent", "1.0.0")

with tracer.start_as_current_span(
    "chat gpt-4",
    kind=SpanKind.CLIENT,
    attributes={
        "gen_ai.operation.name": "chat",
        "gen_ai.provider.name": "openai",
        "gen_ai.request.model": "gpt-4",
        "gen_ai.input.messages": json.dumps([...]),
        "gen_ai.usage.input_tokens": 150,
    }
) as span:
    # LLM call here
    span.set_attribute("gen_ai.output.messages", json.dumps([...]))
    span.set_attribute("gen_ai.usage.output_tokens", 300)

Pattern 3: Metrics Recording

python
from opentelemetry import metrics
meter = metrics.get_meter("my-agent", "1.0.0")
token_histogram = meter.create_histogram(
    name="gen_ai.client.token.usage",
    unit="{token}",
    description="Token usage"
)
token_histogram.record(150, {"gen_ai.token.type": "input", "gen_ai.provider.name": "openai"})

4. Answers to Specific Research Questions

4.1 Question 3.1: Does OTel Suggest Any Instrumentation Library?

Answer: Yes. OTel maintains official instrumentation libraries for:

  • LLM providers: opentelemetry-instrumentation-openai, -anthropic, -google-genai, -bedrock
  • Frameworks: opentelemetry-instrumentation-langchain, -llama-index
  • These are available for Python, Node.js, Go, Java, .NET (coverage varies by provider)

The semantic conventions repository documents these integrations and provides examples for each major provider in docs/gen-ai/ (e.g., openai.md, anthropic.md, aws-bedrock.md).

4.2 Question 3.2: What Information Does OTel Suggest Collecting?

Answer: See Section 1.2 above. The core attributes are:

  • Operation identity: gen_ai.operation.name, gen_ai.provider.name, gen_ai.request.model
  • Messages: gen_ai.input.messages, gen_ai.output.messages, gen_ai.system_instructions (all opt-in)
  • Usage: gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.usage.details.*
  • Agent context: gen_ai.agent.id, gen_ai.agent.name, gen_ai.conversation.id (when applicable)
  • Tools: gen_ai.tool.* attributes for function calling
  • Response metadata: gen_ai.response.finish_reasons, gen_ai.response.id

The full list is in semantic-conventions/model/gen-ai/spans.yaml and registry.yaml.

4.3 Question 3.3: Does OTel Suggest Collecting LLM Messages?

Answer: Yes, but explicitly as opt-in. The gen_ai.input.messages and gen_ai.output.messages attributes are defined with status "Opt-In". This means:

  • Instrumentations may collect and emit these attributes
  • They should provide configuration to disable message capture
  • They must respect user privacy and data protection requirements

The spec states: "Capturing the actual content of messages is optional and should be configurable."

4.4 Question 3.4: Which Field Should Store LLM Messages?

Answer: The standard attribute names are:

  • Input: gen_ai.input.messages
  • Output: gen_ai.output.messages

Both accept either a JSON string or an array of structured ChatMessage objects. The preferred format is structured with role and parts (see Section 1.4 for schema).

4.5 Question 3.5: Anything About "Observability 2.0"?

Answer: The OTel spec doesn't use the term "Observability 2.0" explicitly, but the event-based approach (gen_ai.client.inference.operation.details) embodies the same principles:

  • Decouple high-cardinality message data from span attributes
  • Store structured events with their own timestamps and metadata
  • Allow selective ingestion (backends can choose to index events separately)
  • Better performance for large traces (span attributes stay small)

The event system is detailed in docs/gen-ai/gen-ai-events.md and represents the modern OTel approach to LLM telemetry.


4.6 Question 2.1: How Is Instrumentation Implemented? (Which Library?)

See Section 2.2 table. Frameworks use:

  • Direct OTel SDK: pydantic-ai, agent-framework (Microsoft), autogen
  • Custom abstraction: openai-agents-python, crewAI, youtu-agent
  • Delegated: langgraph (via LangChain), llama_index (external)

4.7 Question 2.2: What Information Is Collected?

Varies widely:

  • Minimal: autogen focuses on agent/tool spans without LLM messages
  • Comprehensive: pydantic-ai, agent-framework collect full GenAI set
  • Proprietary: openai-agents-python collects rich data but not OTel-standard
  • Selective: crewAI collects agent/task/LLM data but with custom schema

4.8 Question 2.3: Storage Type (Traces, Metrics, Logs, Events)?

FrameworkTracesMetricsLogsEventsCustom
pydantic-ai✅ Spans✅ Histograms
agent-framework✅ Spans✅ Histograms
autogen✅ Spans
crewAI✅ Spans (custom)✅ (batch)JSON backend
openai-agents-python✅ CustomOpenAI API
langgraphDepends on LangChain

4.9 Question 2.4: Does It Follow OTel Best Practices?

Full Compliant: pydantic-ai, agent-framework (Microsoft), agent-framework (Go) Partial: autogen (agent spans ok, but missing LLM), beeai-framework (likely) Non-Compliant: crewAI, openai-agents-python, youtu-agent, langgraph (native)


4.10 Question 3.1-3.4: LLM Message Capture Details

See Section 3.2 table. Only pydantic-ai and agent-framework (Microsoft) capture messages using OTel-standard fields. Both use JSON serialization of structured message objects with role and parts.

Message format compliance:

  • pydantic-ai: Fully compliant with multimodal spec (text, uri, blob)
  • agent-framework: Fully compliant, includes type, content, modality, mime_type for images
  • crewAI: Custom format, not OTel-compliant
  • others: Either don't capture or use proprietary format

5. Key Insights and Recommendations

5.1 Observability Landscape

OTel dominance: All new frameworks should target OTel GenAI compliance. The standard has stabilized (Development status but widely implemented).

Message capture is sensitive: Only two frameworks (pydantic-ai, agent-framework) capture messages by default; both provide opt-out. CrewAI requires explicit sharing consent. This reflects growing privacy awareness.

Event-based pattern emerging: For high-scale production, storing messages as events rather than span attributes is recommended to avoid bloating traces. Pydantic-ai v1 supported this; v2 simplified to span attributes but the event approach remains viable.

5.2 Framework Selection Guidance

For new projects requiring observability:

  1. pydantic-ai - Best OTel integration, type-safe, production-ready
  2. agent-framework (Microsoft) - Comprehensive, enterprise-friendly, Azure integration
  3. autogen - Good for multi-agent but may need supplemental LLM instrumentation

For managed services:

  • openai-agents-python if using OpenAI's platform exclusively
  • Avoid if you need portable, vendor-neutral telemetry

For research/experimentation:

  • langgraph if you want LangChain ecosystem
  • AgentVerse for multi-agent behavior studies (no OTel)

5.3 Implementation Best Practices

Based on analysis of compliant frameworks:

  1. Use standard attributes: Always gen_ai.*, never custom names for LLM data
  2. Make message capture opt-in: Provide clear configuration flags
  3. Support structured messages: Use ChatMessage schema with multimodal parts
  4. Emit metrics: Token usage histograms and duration metrics are cheap and valuable
  5. Propagate context: Use OTel baggage for cross-span correlation (agent name, conversation ID)
  6. Batch exports: Use BatchSpanProcessor for performance
  7. Graceful degradation: Disable telemetry on export failures (don't break user app)
  8. Version your schema: If extending OTel, add version attribute (like pydantic-ai's instrumentation_version)

5.4 Gaps and Future Work

Areas needing improvement:

  1. autogen: Should capture LLM messages in gen_ai.input/output.messages format. Currently only traces agent/tool boundaries.
  2. langgraph: Native OTel support would reduce dependency on LangChain's callback system.
  3. crewAI: Should migrate to GenAI standard attributes for better ecosystem compatibility.
  4. Documentation: Many frameworks lack clear telemetry setup guides. OTel's docs/gen-ai/ is excellent but not widely referenced.

6. Conclusion

OpenTelemetry provides a mature, well-designed set of semantic conventions for agent and LLM observability. The gen_ai.input.messages and gen_ai.output.messages attributes are the definitive standard for storing LLM message content, with clear opt-in semantics and multimodal support.

Among agent frameworks, pydantic-ai and Microsoft's agent-framework lead in OTel compliance, implementing the full GenAI spec with proper message capture. autogen has good foundation but lacks LLM message telemetry. Other frameworks either use proprietary systems (openai-agents-python) or rely on external instrumentation (langgraph, llama_index).

For production agent systems requiring observability, we recommend:

  1. Choose a framework with native OTel GenAI support (pydantic-ai, agent-framework)
  2. If using other frameworks, add opentelemetry-instrumentation-<provider> for the underlying LLM calls
  3. Always configure message capture as opt-in with clear user consent
  4. Export to standard OTLP endpoints for backend flexibility

Appendix: Repository Analysis Summary

Total repositories analyzed: 29 Instrumentation Libraries: 7 (Section 1) Agent Building Frameworks: 16 (Section 2) Agent Projects: 6 (Section 2.2)

OTel GenAI Compliance Matrix:

RepositoryTypeOTel Used?LLM MessagesGenAI AttributesScore
pydantic-aiFramework✅ Direct✅ Yes✅ Full10/10
agent-frameworkFramework✅ Direct✅ Opt-in✅ Full10/10
autogenFramework✅ Direct❌ No⚠️ Partial6/10
crewAIFramework⚠️ Partial✅ Custom❌ Custom4/10
openai-agents-pythonFramework❌ None❓ Unclear❌ Custom2/10
langgraphFramework⚠️ Indirect⚠️ Via LC⚠️ Indirect5/10
beeai-frameworkFramework⚠️ Likely❓ Unclear⚠️ Partial5/10
llama_indexFramework⚠️ Indirect⚠️ Via LC⚠️ Indirect5/10
youtu-agentFramework❌ Custom✅ Custom❌ Custom3/10
MetaGPTFramework❌ None❌ None❌ None0/10
Qwen-AgentFramework❌ None❌ None❌ None0/10
AutoAgentFramework❌ None❌ None❌ None0/10
AgentVerseFramework❌ None❌ None❌ None0/10
spring-ai-alibabaFramework⚠️ Likely⚠️ Likely⚠️ Likely6/10
Open-AutoGLMFramework❌ None❌ None❌ None0/10
agentfieldFramework✅ Yes✅ Yes✅ Full10/10
nanobotApplication❌ None❌ None❌ None0/10
picoclawApplication⚠️ Maybe❓ Unclear❓ Unclear2/10
ironclawApplication❌ None❌ None❌ None0/10
openclawApplication❌ None❌ None❌ None0/10
NemoClawApplication⚠️ Maybe❓ Unclear❓ Unclear2/10
QclawApplication⚠️ Maybe❓ Unclear❓ Unclear2/10

Score breakdown:

  • 10/10: Full OTel GenAI compliance, includes message capture
  • 6-8/10: Partial compliance (some attributes, missing messages)
  • 3-5/10: Minimal or indirect OTel usage
  • 0-2/10: No OTel support

End of Report