Dataset facet scoping (issue #212, external-review fix for PR #218). Adds an optional metadata scope refinement to a dataset's persisted candidate query and its resolved match set to each run record. Purely additive; the TraceRecord wire shape is untouched entirely, this bump only touches the local dataset-lifecycle models.
Read the full schema documentation for design rationale and usage guides, or see contributing to the schema to propose changes.
Root record. One per session, one JSONL line.
| field | type | description | |
|---|---|---|---|
| schema_version | string | req | e.g. "0.9.0" |
| trace_id | string | req | UUID for this trace |
| session_id | string | req | Agent's native session ID |
| content_hash | string | SHA-256 hex of the serialized record, used for cross-contributor dedup at upload time. Unchanged by 0.3.0. | |
| timestamp_start | string | ISO 8601 start | |
| timestamp_end | string | ISO 8601 end | |
| task | Task | Task metadata | |
| agent | Agent | req | Agent identity |
| environment | Environment | OS, shell, VCS, languages | |
| system_prompts | dict | Deduplicated prompts keyed by hash | |
| tool_definitions | dict[] | Available tool schemas | |
| steps | Step[] | TAO-loop steps | |
| outcome | Outcome | Session outcome | |
| dependencies | string[] | Project dependencies | |
| metrics | Metrics | Aggregated metrics | |
| security | SecurityMetadata | Security tier and redactions | |
| attribution | Attribution | Code attribution (experimental) | |
| metadata | dict | Extensible key-value pairs | |
| execution_context | string | null | "devtime" (code-editing agent) or "runtime" (action-trajectory / RL agent). Null for pre-0.2 traces. | |
| lifecycle | string | "provisional" (pre-commit-correlation) or "final" (revision-anchored). Default provisional. | |
| git_links | GitLink[] | Evidence-graded links to commits/revisions this trace contributed to. | |
| generation_index | int | Monotonic per-session_id generation counter. Consumers resolving 'latest' should group by session_id and take max(generation_index). | |
| context_tree_summary | dict | Summary of Context Tree capture: node_count, layer_count, active_path_leaf_id, capture_limitations. | |
| patches | Patch[] | Authoritative dev-time output set. One Patch per tool-produced change/hunk. |
Task metadata for filtering and grouping.
| field | type | description | |
|---|---|---|---|
| description | string | What the task is | |
| source | string | user_prompt, cli_arg, skill, etc. | |
| repository | string | owner/repo format | |
| base_commit | string | Starting commit SHA | |
| repository_url | string | Canonical remote URL, e.g. https://github.com/org/repo |
Agent identity.
| field | type | description | |
|---|---|---|---|
| name | string | req | claude-code, cursor, codex, etc. |
| version | string | Agent version | |
| model | string | provider/model-name |
Runtime context.
| field | type | description | |
|---|---|---|---|
| os | string | darwin, linux, etc. | |
| shell | string | zsh, bash, etc. | |
| vcs | VCS | type, base_commit, branch, diff | |
| language_ecosystem | string[] | python, typescript, etc. | |
| resolved_dependencies | PinRecord[] | null | Exact resolved dependency pins for the trace's closure. None until a resolver fills them; presence does not raise env_tier. | |
| interpreter | Interpreter | null | Runtime interpreter identity (name/version). | |
| arch | string | null | CPU architecture, e.g. arm64, x86_64 | |
| platform | string | null | Platform tag, e.g. macosx_14_0_arm64, linux | |
| abi_tag | string | null | Python ABI tag for the L3 wheel-platform boundary, e.g. cp311 |
One LLM API call in the TAO loop.
| field | type | description | |
|---|---|---|---|
| step_index | int | req | Sequential index |
| role | string | req | system | user | agent |
| content | string | Message content | |
| reasoning_content | string | Chain-of-thought | |
| model | string | Model for this step | |
| system_prompt_hash | string | Key into system_prompts | |
| agent_role | string | main, explore, plan, etc. | |
| parent_step | int | Parent step index | |
| call_type | string | main | subagent | warmup | |
| subagent_trajectory_ref | string | Sub-agent session ID | |
| tools_available | string[] | Available tool names | |
| tool_calls | ToolCall[] | Tool invocations | |
| observations | Observation[] | Tool results | |
| snippets | Snippet[] | Extracted code blocks | |
| token_usage | TokenUsage | Token breakdown | |
| timestamp | string | ISO 8601 | |
| context_node_id | string | null | Context Tree node id for the model view at this step. |
A tool invocation within a step.
| field | type | description | |
|---|---|---|---|
| tool_call_id | string | req | ID for linking to observations |
| tool_name | string | req | Tool name |
| input | dict | Input parameters | |
| duration_ms | int | Wall-clock time |
Tool result linked to its ToolCall.
| field | type | description | |
|---|---|---|---|
| source_call_id | string | req | Links to ToolCall |
| content | string | Full output | |
| output_summary | string | Lightweight preview | |
| error | string | Error info if failed |
Per-step token breakdown.
| field | type | description | |
|---|---|---|---|
| input_tokens | int | Input tokens | |
| output_tokens | int | Output tokens | |
| cache_read_tokens | int | From cache | |
| cache_write_tokens | int | Written to cache | |
| prefix_reuse_tokens | int | Via prefix caching |
Session outcome for reward modeling.
| field | type | description | |
|---|---|---|---|
| success | boolean | Goal achieved | |
| signal_source | string | Default: "deterministic" | |
| signal_confidence | string | derived | inferred | annotated | |
| description | string | Outcome description | |
| committed | boolean | Changes committed to git | |
| commit_sha | string | Commit SHA | |
| terminal_state | string | null | "goal_reached", "interrupted", "error", or "abandoned". Meaningful for runtime agents. | |
| reward | float | null | Numeric reward signal from an RL environment or evaluator. | |
| reward_source | string | null | Canonical values: "rl_environment", "judge", "human_annotation", "orchestrator". |
Code attribution (experimental).
| field | type | description | |
|---|---|---|---|
| experimental | boolean | Always true in v0.1.x | |
| files | AttributionFile[] | Per-file line ranges | |
| revision | dict | Pins this block to a revision. Keys: vcs_type ('git'|'jj'), revision. | |
| unaccounted_files | string[] | Files changed at commit time with no tracked Edit/Write source (e.g. Bash sed edits). Low confidence. |
Session-level aggregates.
| field | type | description | |
|---|---|---|---|
| total_steps | int | Step count | |
| total_input_tokens | int | Sum of input tokens | |
| total_output_tokens | int | Sum of output tokens | |
| total_duration_s | float | Wall-clock seconds | |
| cache_hit_rate | float | 0.0 to 1.0 | |
| estimated_cost_usd | float | Estimated cost | |
| total_cache_read_tokens | int | Session-level prompt-cache read aggregate. | |
| total_cache_creation_tokens | int | Session-level prompt-cache write aggregate. |
Security scan summary. Detailed tool output lives under metadata.security.
| field | type | description | |
|---|---|---|---|
| scanned | boolean | Whether security processing was applied to this record. | |
| flags_reviewed | int | Number of security flags reviewed. | |
| redactions_applied | int | Number of redactions applied. | |
| classifier_version | string | null | Classifier tool version when classifier ran. |
Evidence-graded link between a trace and a commit/revision. A trace can link to many commits (rebase, squash, long session); a commit can link to many traces (cherry-pick, composition).
| field | type | description | |
|---|---|---|---|
| vcs_type | string | req | "git" or "jj". |
| revision | string | req | Commit SHA or jj change id. |
| repo_url | string | Canonical remote URL. | |
| branch | string | Branch at correlation time. | |
| tier | string | req | "tool_emitted" (Edit hashes match committed hunks), "tool_emitted_with_divergence" (file overlap but bytes diverge), "overlapping" (file-set overlap, no hash match), or "orphan". |
| commit_reachable | boolean | Computed lazily on read; false if commit was force-pushed away. | |
| content_alive | boolean | Computed lazily on read; false if agent's hashes no longer appear at HEAD. |
A range of lines attributed to an agent conversation.
| field | type | description | |
|---|---|---|---|
| start_line | int | req | First attributed line (1-indexed). |
| end_line | int | req | Last attributed line (inclusive). |
| content_hash | string | murmur3:<32-hex> for cross-refactor tracking. | |
| confidence | string | high | medium | low. | |
| change_type | string | "addition", "modification", or "deletion". Default "addition". | |
| original | dict | Pre-divergence state when a formatter/human rewrote agent output. Keys: start_line, end_line, content_hash. | |
| contributor | dict | Per-range contributor override (used when the enclosing conversation is 'mixed'). |
Links attributed code ranges to the conversation that produced them.
| field | type | description | |
|---|---|---|---|
| contributor | dict | e.g. {type: 'ai', model_id: 'anthropic/claude-sonnet-4'} | |
| url | string | opentraces://trace_id/step_N | |
| ids | dict | Provider-native conversation ids. e.g. {anthropic: 'msg_01xyz', openai: ['resp_1', 'resp_2']} | |
| related | dict[] | Links to broader resources. Each entry: {type, url}. e.g. {type: 'plan', url: 'opentraces://t/plan_3'} | |
| ranges | AttributionRange[] | Attributed line ranges. |
Typed link from a Patch to its appearance in Git.
| field | type | description | |
|---|---|---|---|
| last_searched_at | string | req | ISO8601 timestamp set after the first maturation search. |
| found | boolean | req | Whether a matching commit was found. |
| commit_sha | string | null | Matched commit SHA when found. | |
| path | string | null | Path in the commit; may differ after rename. | |
| blob_sha | string | null | Matched Git blob SHA. | |
| git_patch_id | string | null | Git patch-id, stable across rebase. | |
| evidence_tier | string | null | Evidence match label such as exact_range_hash, patch_id, formatter_divergent, overlapping_hunk, or orphan. | |
| evidence_firmness | string | null | Firmness label such as firm_observed, provisional, human_asserted, or unknown. |
A trace-produced change. Full patch history resolves through the bucket Trail companion.
| field | type | description | |
|---|---|---|---|
| patch_id | string | req | Content-addressed trace patch id. |
| file_path | string | req | Path at creation time. |
| step_index | int | null | Producing step index. | |
| tool_call_id | string | null | Producing tool call id. | |
| capture_method | string[] | Capture methods such as hook_pretooluse, hook_posttooluse, watcher_backstop. | |
| snapshot_before_id | string | null | Before snapshot id. | |
| snapshot_after_id | string | null | After snapshot id. | |
| anchor | GitAnchor | null | Git match when the patch matures into a commit. | |
| superseded_by | string[] | Commit supersede chain after amend/rebase/squash. | |
| limitations | string[] | Capture quality flags. |
A single resolved dependency pin (name==version, optionally hashed). A structured home for a future resolver's output, not a resolver itself.
| field | type | description | |
|---|---|---|---|
| name | string | req | Dependency name |
| version | string | null | Exact resolved version, e.g. 2.31.0 | |
| hash | string | null | Artifact hash (e.g. sha256:...) for the future L3 wheel path | |
| marker | string | null | PEP 508 environment marker, e.g. python_version >= '3.8' | |
| source | string | null | Resolver/index the pin came from (routed through the redaction floor) |
Runtime interpreter identity (name + version, e.g. cpython 3.11.6).
| field | type | description | |
|---|---|---|---|
| name | string | null | Interpreter name, e.g. cpython, pypy | |
| version | string | null | Interpreter version, e.g. 3.11.6 |
{
"schema_version": "0.9.0",
"trace_id": "a4f2b8c1-e2d3-4f5a-b6c7-d8e9f0a1b2c3",
"session_id": "sess_0x8f2a1b3c",
"content_hash": "e3b0c44298fc1c14...",
"timestamp_start": "2026-03-27T14:30:00Z",
"task": {
"description": "Add input validation to the signup form",
"repository": "acme/webapp",
"base_commit": "a1b2c3d4"
},
"agent": {
"name": "claude-code",
"version": "1.0.32",
"model": "anthropic/claude-sonnet-4-20250514"
},
"environment": {
"os": "darwin",
"shell": "zsh",
"vcs": { "type": "git", "branch": "main" },
"language_ecosystem": ["typescript"]
},
"system_prompts": {
"abc123": "You are Claude Code..."
},
"steps": [
{
"step_index": 0,
"role": "user",
"content": "Add Zod validation to the signup form"
},
{
"step_index": 1,
"role": "agent",
"content": "I'll add Zod validation...",
"model": "anthropic/claude-sonnet-4-20250514",
"system_prompt_hash": "abc123",
"agent_role": "main",
"call_type": "main",
"tool_calls": [{
"tool_call_id": "tc_001",
"tool_name": "Edit",
"input": { "file_path": "src/signup.tsx" },
"duration_ms": 120
}],
"observations": [{
"source_call_id": "tc_001",
"output_summary": "Added Zod schema to signup form",
"content": "File edited successfully"
}],
"token_usage": {
"input_tokens": 4200,
"output_tokens": 1800,
"cache_read_tokens": 3800,
"prefix_reuse_tokens": 3800
}
}
],
"outcome": {
"success": true,
"signal_source": "deterministic",
"signal_confidence": "derived",
"committed": true,
"commit_sha": "f5e6d7c8"
},
"metrics": {
"total_steps": 2,
"total_input_tokens": 8400,
"total_output_tokens": 1800,
"cache_hit_rate": 0.9,
"estimated_cost_usd": 0.24
},
"security": { "tier": 2, "redactions_applied": 1 }
}