Standards Alignment
opentraces sits at the intersection of four public standards. It adopts what works from each, and bridges the gap between trajectory (process) and attribution (output).
ATIF / Harbor (v1.6)
github.com/laude-institute/harbor
A training trajectory serialization format for agent research. Defines the step-based TAO (Thought-Action-Observation) loop, with fields for token IDs, logprobs, and reward signals designed for RL and SFT pipelines.
Relationship: opentraces is a superset of ATIF. We adopt the step-based model, role conventions (system | user | agent), and field patterns. We add attribution blocks, per-step token breakdowns, environment metadata, dependency tracking, and security metadata. The downstream field mappings live in packages/opentraces-schema/FIELD-MAPPINGS.md; the public export workflow is still experimental.
ADP (Agent Data Protocol)
An interlingua for normalizing diverse agent trace formats into a common structure for training. Proposes a universal adapter layer so each dataset and each agent only needs one converter, O(D+A), instead of pairwise mappings, O(D*A).
Relationship: opentraces' adapter-based normalization follows the same pattern. Per-agent parsers are ADP-style adapters outputting the enriched schema.
Agent Trace (Cursor/community, v0.1.0 RFC)
A code attribution spec (CC BY 4.0) that records which lines of code came from which agent conversation, at file/line granularity. Backed by 10+ sponsors (Cloudflare, Vercel, Google Jules, Cognition).
Relationship: opentraces embeds Agent Trace attribution blocks directly in the trace record. Agent Trace focuses on output (code attribution), opentraces bridges that with process (trajectory).
Agent Trace RFCs adopted (schema 0.3.0)
| RFC | Topic | Where it lands |
|---|---|---|
| #5 | original pre-processing snapshot on divergent ranges | AttributionRange.original |
| #9 | Provider-native conversation IDs | AttributionConversation.ids |
| #11 | change_type on ranges | AttributionRange.change_type |
| #16 | Baseline related resource vocabulary | AttributionConversation.related |
| #22 | Canonical repository_url | Task.repository_url |
| #25 | Lifecycle / revision-pinning | TraceRecord.lifecycle, Attribution.revision |
| #26 | unaccounted_files for non-tool edits | Attribution.unaccounted_files |
| #27 | Evidence-graded commit linking | TraceRecord.git_links[], GitLink.tier |
Adoption is additive — pre-0.3.0 traces validate unchanged.
opentraces export --format agent-trace emits Agent Trace v0.1.0 JSONL based on these fields.
OTel GenAI Semantic Conventions
opentelemetry.io/docs/specs/semconv/gen-ai
OpenTelemetry's GenAI semantic conventions define standardized span attributes for LLM calls in observability pipelines, covering model names, token counts, and request metadata.
Relationship: opentraces' per-step token usage and model fields align with OTel GenAI conventions, enabling cross-referencing between observability spans and training trajectories.
The Core Insight
Agent Trace preserves which lines came from AI. ATIF/ADP preserve how the agent reasoned. Neither alone tells the complete story. opentraces connects the full conversation trajectory to the specific code output at line granularity.
Message Taxonomy
opentraces adopts a training-oriented message taxonomy:
| Role | Description |
|---|---|
system | System prompt (deduplicated by hash) |
user | User message / prompt |
agent | Agent response, tool calls, or thinking |
Agent steps are further classified by call_type (main, subagent, warmup) and agent_role (main, explore, plan).