docs / workflow / quality

Assess

opentraces assess scores trace quality against the current downstream-facing rubrics.

opentraces assess
opentraces assess --judge --judge-model sonnet
opentraces assess --dataset owner/team-traces
opentraces assess --explain

Local mode assesses staged traces first. If nothing is staged yet, it falls back to the local trace store so you can still inspect quality before deciding what to upload.

Push Integration

opentraces push runs assessment by default and embeds the resulting scorecard into the dataset card. Use --no-assess when you want to skip that pass for a particular push.

Scoring Model

Assessment is deterministic by default. The core score is computed from Python checks over the TraceRecord structure, without external calls or randomness.

An optional LLM judge can add qualitative scoring:

opentraces assess --judge
opentraces assess --judge --judge-model haiku
opentraces assess --judge --judge-model sonnet
opentraces assess --judge --judge-model opus

Personas

Every trace is scored across five consumer-facing personas:

PersonaWhat it checks
ConformanceSchema correctness and structural completeness
TrainingSFT-readiness: dialogue quality, tool-call structure, usable reasoning
RLOutcome and reward-signal usefulness
AnalyticsMetrics, timing, cost, and observability coverage
DomainMetadata that makes the trace discoverable and reusable

Run opentraces assess --explain for the full glossary and threshold details exposed by the CLI.

Remote Datasets

To assess a dataset already on Hugging Face:

opentraces assess --dataset owner/team-traces

This is independent of the current local inbox.

Typical Flows

opentraces add --all
opentraces assess
opentraces push

Or, when you want a stricter push gate:

opentraces llm-review --scope staged
opentraces push --llm-review