When LLMs drive the logic, traces become the real source: the record of decisions, tool calls, and reasoning behind the outcome.
opentraces lets you parse, sanitise, review, and push those sessions to HuggingFace Hub so you or others can build on real workflows, not synthetic benchmarks.
init, status, review, push. The workflow you know, applied to agent sessions.
Regex + entropy + optional TruffleHog and LLM review. Stable placeholders like [EMAIL_1] keep traces coherent.
Per-project policy: auto-approve safe traces, or gate every session through the TUI or browser inbox.
Every shipped line back to the prompt behind it. ot blame and ot graph map commits to their agent sessions.
Steps, tool calls, reasoning, sub-agents, tokens, attribution, outcome in one record.
Sharded JSONL on HF Hub. Load via datasets.load_dataset() or mount it as a virtual filesystem. No lock-in.
Five persona rubrics score every trace. Upload gates enforce minimums. Re-score remotely with ot assess.
Reset state, switch machines, re-push safely. murmur3 hashing blocks duplicates on the remote.
Every command emits structured JSON with next_steps. Built for agents to drive agents.
Layered scanning: 30+ regex patterns, Shannon entropy, optional TruffleHog (800+ detectors, opt-in), and optional local LLM review. Stable placeholders like [EMAIL_1] preserve referential meaning across a trace.
When agents write most of the code, git blame points at a user, not a session id. ot blame and ot graph resolve every commit back to the traces that produced it — so you can see what worked and what didn't across your agent sessions.
Attribution search can surface as a semantic diff — added functions, modified classes, renamed files — so agents can pull up the session behind any change in a fraction of the tokens a line-level view would cost.
Alternating role sequences, tool call/observation pairing, reasoning coverage. Validated against 10 quality checks before upload.
Committed patches as reward proxies, per-step token costs for cost-penalized reward, sub-agent hierarchy for credit assignment.
Cache hit rates, per-step token breakdowns, duration timelines, model distribution. Real production inputs with outcome signals become reproducible eval datasets for quality gating — no annotation queue required.
Language tags, extracted dependencies, VCS context, code snippets with language annotations. Build domain-specific datasets from HF queries.
One session, one JSONL line. Full schema docs →
{
"schema_version": "0.3.0",
"trace_id": "uuid",
"execution_context": "devtime",
"task": { "description": "Fix the failing test...", "repository": "owner/repo" },
"agent": { "name": "claude-code", "model": "anthropic/claude-sonnet-4" },
"steps": [ // TAO loop
{ "role": "user", "content": "..." },
{ "role": "agent", "tool_calls": [...], "reasoning_content": "..." }
],
"outcome": { "success": true, "committed": true, "patch": "...", "terminal_state": null, "reward": null, "reward_source": null },
"attribution": { "files": [{ "path": "src/parser.ts", "ranges": [...] }] },
"metrics": { "total_steps": 42, "estimated_cost_usd": 2.40 },
"security": { "scanned": true },
"dependencies": ["react", "typescript"]
}Open data is the new open source. Your agent traces are the most valuable dataset nobody is collecting. Start contributing to the commons.