Runs and artifacts | AXP Documentation

axp local run <experiment>.yaml writes local artifacts under the current working directory:

./.axp/runs/<run-id>/

This layout is what a local run produces. Platform runs (axp run) store their results in your organization; the harness uploads the same per-variant tree (keyed under variants/<variant-id>/, not a separate results/ form). axp download <run-id> does not pull that file tree back down — it fetches the run's query Parquet tables to ./.axp/downloads/<run-id>/<table>.parquet so you can query a platform run locally (see Inspecting run artifacts).

A run is one real run iteration. A normal axp local run creates one run; axp local run --repeat <N> creates one run with <N> repeats per variant. The run id is the experiment id plus a ULID:

<experiment-id>-<26-character-ulid>

The ULID carries the run timestamp. axp list orders local runs by the recorded start time when available, falling back to directory modified time.

Run Layout

A completed run has this top-level shape:

.axp/runs/<run-id>/
  group.json
  index.jsonl
  experiment.yaml
  variants/
    <variant-id>/                 # <variant-id>/<repeat-idx>/ when --repeat > 1
      layout.json
      run.json
      resolved-variant.yaml
      staging.json                # only when the experiment stages files
      agent-events.jsonl
      harness-events.jsonl
      agent.log
      setup.log
      setup-checks/               # only when the variant declares setup_checks
        <check-name>.json
      tests/
        application/
          <test-name>.json
        introspection/
          <test-name>.json
      fs-diff/
        000000.json

Everything for a variant lives under variants/<variant-id>/. There is no separate results/ directory — local runs and the uploaded platform tree both use this same per-variant layout.

Only variants selected for the run are written. If you run axp local run --variant a --variant b, the run contains variants/a/ and variants/b/, not every variant from the experiment matrix.

axp local run --resolve-variants is different: it renders the plan and returns without writing a run under your project.

Resolved Variant YAML

variants/<variant-id>/resolved-variant.yaml is the resolved experiment for that variant. It is written before the agent runs, so it survives errored runs.

Use this file when you need to see the exact prompt, model, test set, setup, optional experiment_description, and resolved axis values the harness used for a variant. Declared secret names appear; resolved secret values never do.

staging.json

variants/<variant-id>/staging.json is written when the experiment stages files into the sandbox via files or axp local run --file. One record per staged entry, in delivery order: {name?, source, dest, file_count, skipped_symlinks, size_bytes, tar_sha256}. For a host-path source source is the resolved path; for an http(s) URL source it is the URL. On a staging_failed variant the failing entry's record carries an error field and later entries are absent. Unlike resolved-variant.yaml, this reflects what actually staged — including --file source binds and ad-hoc entries.

group.json and index.jsonl

The run's group-level files live at the run root.

group.json is the run manifest. It is written when the run is created (with ended_at null) and re-stamped with ended_at when the run finishes:

{
  "version": 1,
  "run_request_id": "hello-01HV...",
  "experiment_id": "hello",
  "experiment_name": "Hello",
  "started_at": "2026-01-01T00:00:00Z",
  "ended_at": "2026-01-01T00:00:42Z",
  "jobs": 2,
  "repeat": 1
}

index.jsonl is the per-variant index — one JSON object per line, appended as each variant finishes:

{"variant_id":"baseline","variant_tag":"Baseline","repeat_idx":null,"status":"pass","started_at":"2026-01-01T00:00:00Z","ended_at":"2026-01-01T00:00:42Z"}

experiment.yaml is a verbatim copy of the source experiment YAML you invoked the run with, captured at run creation. The resolved-per-variant form in resolved-variant.yaml is not identical — formatting, comments, and the full matrix are lost — so this copy is what reproduction tooling and axp send-debug rely on.

run.json

Each variant writes variants/<variant-id>/run.json at the end of the variant run. It is written to a temp file and atomically renamed into place, so a reader (or a tree pull) never observes a half-written summary. It is the compact, structured result for that variant:

{
  "version": 1,
  "axp_cli_version": "0.5.17-rp",
  "run_request_id": "hello-01HV...",
  "experiment_id": "hello",
  "variant_id": "claude__claude-sonnet-4-6__p0__baseline",
  "variant_tag": "claude · claude-sonnet-4-6 · p0 · baseline",
  "started_at": "2026-01-01T00:00:00Z",
  "ended_at": "2026-01-01T00:00:42Z",
  "agent_started_at": "2026-01-01T00:00:01Z",
  "agent_ended_at": "2026-01-01T00:00:38Z",
  "status": "pass",
  "agent": {
    "exit_reason": "end_turn",
    "hit_timeout": false,
    "cost_usd_micros": 12345,
    "num_turns": 4,
    "input_tokens": 5821,
    "output_tokens": 412
  },
  "tests": [
    {
      "kind": "application",
      "name": "file-exists",
      "exit_code": 0,
      "duration_ms": 1200,
      "stdout_tail": "",
      "stderr_tail": ""
    }
  ]
}

Timestamps are RFC 3339 strings. Agent cost is fixed-point millionths in cost_usd_micros (divide by 1,000,000 for USD). axp_cli_version records the effective axp --version value for the CLI that submitted or ran the variant. repeat_idx is present only under --repeat, and setup_checks (an array shaped like tests minus kind) is present only when the variant declares setup checks; both are omitted otherwise. Files written by an older CLI may use the legacy field name run_id in place of run_request_id — readers accept both.

status is a lowercase JSON string:

pass: no timeout, no cost cap, no harness error, and every executed test passed. A non-zero agent exit code does not by itself flip the rollup — only the conditions below do.
fail: at least one test failed.
timeout: the agent hit max_time_seconds; tests are not run for that variant.
cost_cap: the agent exceeded max_cost_usd.
error: a harness/driver-level fault before tests could complete — a driver error, a failed setup_check, a files staging failure, or a cancel.

The stdout_tail / stderr_tail fields keep the last 8 KB of each test's stream. The per-test JSON files under tests/ hold the same record (see Test results).

Event streams

A variant's two event-stream artifacts are the structured record of the run; they replace the per-variant OTLP trace files earlier versions wrote.

variants/<variant-id>/agent-events.jsonl
variants/<variant-id>/harness-events.jsonl
variants/<variant-id>/agent.log

agent-events.jsonl is the canonical record of the agent invocation: every ACP JSON-RPC frame between the harness and the agent, teed verbatim to disk before parsing and tagged with a classified kind. Cost, turns, token usage, tool calls, and the conversation itself all derive from this stream. When a token count in run.json is 0, this file is where you confirm whether the adapter reported zero or never surfaced a count.

harness-events.jsonl is the span/event stream emitted by the runner itself — setup, setup checks, test execs, snapshots, and lifecycle transitions. It is the on-disk replacement for the in-process OpenTelemetry traces the harness used to write under traces/.

agent.log is the free-form stderr of the agent process — diagnostics, not a structured stream.

On platform (axp run) runs the harness also exports traces to the platform's observability backend; those are not files in the run tree.

setup.log

variants/<variant-id>/setup.log is the combined stdout + stderr of the variant's setup script. It is written on both success and failure, so when a setup script exits non-zero — taking down the variant before setup checks, the agent, and tests can run — its output leaves the sandbox here instead of dying in the bridge's logs. See Troubleshooting for the failure-mode walkthrough.

Setup checks

When a variant declares setup_checks, each check's result is written to variants/<variant-id>/setup-checks/<check-name>.json:

{
  "name": "rust-installed",
  "exit_code": 0,
  "duration_ms": 120,
  "stdout_tail": "",
  "stderr_tail": ""
}

Checks run in declaration order after setup and before the agent; the first non-zero exit short-circuits the variant (status error), and later checks are not represented. The same records, in execution order, also appear in run.json under setup_checks.

Test results

Application and introspection test results are separated so tests with the same name in different kinds do not clobber each other:

variants/<variant-id>/tests/application/<test-name>.json
variants/<variant-id>/tests/introspection/<test-name>.json

Each file is one test record — {kind, name, exit_code, duration_ms, stdout_tail, stderr_tail} — the same shape run.json embeds in its tests array. stdout_tail / stderr_tail keep the last 8 KB of each stream; a test is considered passed when exit_code is 0.

File Diffs

The harness snapshots /workspace before the agent runs and again after it exits, then writes the diff between the two snapshots to a single metadata file:

variants/<variant-id>/fs-diff/000000.json

The record lists the files added, removed, and modified by the agent, with the per-file changes inline. File-diff capture is best effort and only runs when the pre-agent snapshot succeeded: if initialization or a snapshot fails, the harness logs a warning, skips the diff, and continues the run.

axp list

axp list prints runs newest first. By default, it combines local runs from ./.axp/runs/ with remote platform runs. Use axp list --local to scan only local artifacts, or axp list --remote to show only platform runs.

Rows include:

run id
experiment id
status
type (local or remote)
created time

Filter with --status <STATUS>. Add --json when you need machine-readable output. For a local run, axp list derives the row status from a variant's layout.json: a current-layout run shows complete, and a run with no readable layout.json shows partial.

Use --runs to list the individual variant/repeat runs inside each group/request. This exposes the per-run ID, variant, repeat index, status, and terminal detail. Use --run-ids when you want a comma-separated list for commands that accept --run:

axp runs list --local --run-ids \
  | axp analyze prompt

By default, --run-ids selects error-style runs. Add --status <STATUS> for exact status selection or --failed to include unsuccessful terminal runs more broadly.

Inspecting run artifacts

Local run artifacts live on disk under .axp/runs/<run-id>/. Use axp list --local to enumerate local runs, then read the files directly:

.axp/runs/<run-id>/group.json and .axp/runs/<run-id>/index.jsonl
.axp/runs/<run-id>/variants/<variant-id>/resolved-variant.yaml
.axp/runs/<run-id>/variants/<variant-id>/run.json
.axp/runs/<run-id>/variants/<variant-id>/agent-events.jsonl and harness-events.jsonl
.axp/runs/<run-id>/variants/<variant-id>/tests/application/*.json
.axp/runs/<run-id>/variants/<variant-id>/tests/introspection/*.json

For structured local querying, use axp local query <run-id> "<SQL>" — it derives query Parquet from the run tree under .axp/derived/<run-id>/query/ and runs DuckDB SQL against it. For a platform run, axp download <run-id> fetches the run's Parquet tables (runs, agent_events, tool_calls, messages, harness_spans, tests, artifacts) to .axp/downloads/<run-id>/, and axp query "<SQL>" runs ClickHouse SQL against your uploaded org data.

Partial Runs

A partial run is one that did not finish cleanly — interrupted before the run was finalized, or stopped by a harness failure before every variant summary was written. group.json records the completion stamp: a run whose group.json still has ended_at null (or that axp list flags partial because no variant's layout.json is readable at the current layout version) did not complete.

Partial runs may still contain useful artifacts:

group.json and experiment.yaml, written at run creation
resolved variant YAMLs, written before execution
layout.json for a started variant
agent-events.jsonl / harness-events.jsonl / agent.log for the agent invocation, as far as it got
setup.log and setup-checks/ results for the variant's preflight
test result files for tests that ran
fs-diff/000000.json if the snapshots completed

Treat a finalized group.json (with ended_at) as the completion marker for a run. Treat per-variant artifacts as incremental debugging evidence that may exist even when the run did not complete.

Search documentation