Search documentation

Search the Fumadocs-backed documentation index.

Runs and artifacts

How axp stores runs, variant artifacts, summaries, traces, logs, and partial runs under .axp/runs.

axp local run <experiment>.yaml writes local artifacts under the current working directory:

./.axp/runs/<run-id>/

This layout is what a local run produces. Platform runs (axp run) store their results in your organization; pull one down with axp download <run-id> to get the same files under ./.axp/downloads/<run-id>/ and read or query them locally.

A run is one real run iteration. A normal axp local run creates one run; axp local run --repeat <N> creates one run per repeat. The run id is the experiment id plus a ULID:

<experiment-id>-<26-character-ulid>

The ULID carries the run timestamp. axp list orders local runs by the recorded start time when available, falling back to directory modified time.

Run Layout

A completed run has this top-level shape:

.axp/runs/<run-id>/
  index.json
  variants/
    <variant-id>/
      resolved-variant.yaml
      staging.json
  results/
    <variant-id>/
      layout.json
      run-summary.json
      agent-raw.json
      agent.log
      agent-session.jsonl
      workspace/
      application/
        <test-name>.stdout.log
        <test-name>.stderr.log
      introspection/
        <test-name>.stdout.log
        <test-name>.stderr.log
      traces/
        <trace-id>.otlp.json
        <trace-id>.agent.otlp.json
        <trace-id>.agent.logs.json
      api-bodies/
        ...
      commands.jsonl
      commands/
        <command-id>.stdout
        <command-id>.stderr
      fs-diff/
        <seq>.json
        <seq>/
          <flattened-path>.patch

Only variants selected for the run are written. If you run axp local run --variant a --variant b, the run contains variants/a/resolved-variant.yaml, variants/b/resolved-variant.yaml, results/a/, and results/b/, not every variant from the experiment matrix.

axp local run --dry-run is different: it renders the plan and returns without writing a run under your project.

Resolved Variant YAML

variants/<variant-id>/resolved-variant.yaml is the resolved experiment for that variant. It is written before containers start, after filtering by --variant.

Use this file when you need to see the exact prompt, model, test set, setup, optional experiment_description, and resolved matrix values the harness used for a variant.

staging.json

variants/<variant-id>/staging.json is written when the experiment stages files into the sandbox via files or axp local run --file. One record per staged entry, in delivery order: {name?, source, dest, file_count, skipped_symlinks, size_bytes, tar_sha256}. For a host-path source source is the resolved path; for an http(s) URL source it is the URL. On a staging_failed variant the failing entry's record carries an error field and later entries are absent. Unlike resolved-variant.yaml, this reflects what actually staged — including --file source binds and ad-hoc entries.

index.json

index.json is the run-level index. It is written after all selected variants finish and their run-summary.json files have been collected.

Its current schema is:

{
  "schema_version": 1,
  "run_id": "hello-01HV...",
  "experiment_id": "hello",
  "variants": {
    "baseline": {
      "variant_tag": "Baseline",
      "status": "pass",
      "cost_usd": 0.0123,
      "duration_seconds": 42,
      "paths": {
        "results_dir": "results/baseline",
        "run_summary": "results/baseline/run-summary.json"
      }
    }
  }
}

The variants object is keyed by variant id. Each entry contains the human-facing variant tag, rolled-up status, agent cost, duration, and relative paths to the variant result directory and summary.

If index.json is missing or malformed, the run is treated as partial by read-side commands.

run-summary.json

Each variant writes results/<variant-id>/run-summary.json at the end of the variant run. The summary is the compact, structured result for that variant:

{
  "schema_version": 1,
  "run_id": "hello-01HV...",
  "variant_id": "baseline",
  "variant_tag": "Baseline",
  "experiment_id": "hello",
  "status": "pass",
  "agent": {
    "exit_code": 0,
    "turns": 4,
    "cost_usd": 0.012,
    "input_tokens": 5821,
    "output_tokens": 412
  },
  "started_at_epoch": 1714408800,
  "ended_at_epoch": 1714408842,
  "duration_seconds": 42,
  "tests": [
    {
      "name": "file-exists",
      "kind": "application",
      "status": "pass",
      "exit_code": 0,
      "duration_seconds": 1,
      "stdout_tail": "",
      "stderr_tail": ""
    }
  ]
}

Statuses are lowercase JSON strings:

  • pass: there was no timeout and every executed test passed. If tests ran and all passed, a non-zero agent exit code does not by itself change the rollup to error.
  • fail: at least one test failed.
  • timeout: the agent hit max_time_seconds; tests are not run for that variant.
  • error: the agent exited non-zero and no tests ran.

The stdout_tail and stderr_tail fields keep only the last 8 KB of each test log. Full logs stay on disk under application/ or introspection/.

Test Logs

Application and introspection test logs are separated so tests with the same name in different test kinds do not clobber each other:

results/<variant-id>/application/<test-name>.stdout.log
results/<variant-id>/application/<test-name>.stderr.log
results/<variant-id>/introspection/<test-name>.stdout.log
results/<variant-id>/introspection/<test-name>.stderr.log

The summary records each test's kind, status, exit code, duration, and log tails. The files above contain the full stdout and stderr captures.

Agent Output

The agent invocation writes:

results/<variant-id>/agent-raw.json
results/<variant-id>/agent.log
results/<variant-id>/agent-session.jsonl

agent-raw.json is the agent stdout from the claude --output-format json invocation. The harness parses cost, turns, and token usage from this JSON when available.

agent.log is the agent stderr stream.

agent-session.jsonl is copied from Claude Code's per-session JSONL when the session id is present and the file can be found in the container. This copy is best effort; a missing session file does not fail the run.

Traces

Each variant has a traces/ directory. Harness spans are written as:

results/<variant-id>/traces/<trace-id>.otlp.json

When the agent-side OTLP receiver starts successfully, Claude Code telemetry for the same invocation can also appear beside it:

results/<variant-id>/traces/<trace-id>.agent.otlp.json
results/<variant-id>/traces/<trace-id>.agent.logs.json

Agent raw API request and response bodies are written in file mode under:

results/<variant-id>/api-bodies/

The harness continues the run if the agent-side receiver cannot start; in that case, the agent trace and agent log files may be absent.

Command Logs

Every host-driven docker exec goes through the command logger. The per-variant command index lives at:

results/<variant-id>/commands.jsonl

Full stdout and stderr for each logged command live under:

results/<variant-id>/commands/<command-id>.stdout
results/<variant-id>/commands/<command-id>.stderr

The command records cross-reference the command captures with the run's trace spans. Use these files when you need more detail than the compact run-summary.json tails.

File Diffs

The harness attempts file-system snapshots at three points:

  • before_agent: before the agent does work.
  • after_agent: after the agent exits or times out.
  • after_tests: after tests complete.

Diff metadata is written under:

results/<variant-id>/fs-diff/<seq>.json

Per-file unified diff patches are written under a sequence subdirectory:

results/<variant-id>/fs-diff/<seq>/<flattened-path>.patch

Path separators in workspace-relative file names are flattened to underscores for patch file names. For example, src/model.py becomes src_model.py.patch.

File-diff capture is best effort. If initialization or a snapshot fails, the harness logs a warning and continues the run.

axp list

axp list prints runs newest first. By default, it combines local runs from ./.axp/runs/ with remote platform runs. Use axp list --local to scan only local artifacts, or axp list --remote to show only platform runs.

Rows include:

  • run id
  • experiment id
  • status
  • type (local or remote)
  • created time

Filter with --status <STATUS>. Add --json when you need machine-readable output. For local partial runs, axp list still prints a row with status partial when it can identify the run directory.

Inspecting run artifacts

Local run artifacts live on disk under .axp/runs/<run-id>/. Use axp list --local to enumerate local runs, then read the files directly:

  • .axp/runs/<run-id>/index.json
  • .axp/runs/<run-id>/variants/<variant-id>/resolved-variant.yaml
  • .axp/runs/<run-id>/results/<variant-id>/run-summary.json
  • .axp/runs/<run-id>/results/<variant-id>/traces/*.otlp.json
  • .axp/runs/<run-id>/results/<variant-id>/application/*.log
  • .axp/runs/<run-id>/results/<variant-id>/introspection/*.log

For structured local querying, use axp local query <run-id> "<SQL>". After uploading a run, use axp download <run-id> to fetch the platform Parquet locally and query it the same way.

Partial Runs

A partial run is any run whose index.json cannot be read. Common causes include interruption before the final index write or a harness failure before all variant summaries are collected.

Partial runs may still contain useful artifacts:

  • resolved variant YAMLs written before execution
  • layout.json for a started variant
  • agent stdout/stderr if the agent invocation completed
  • test logs for tests that ran
  • trace files, command logs, API bodies, and file diffs written before interruption

Treat index.json as the completion marker for a run. Treat per-variant artifacts as incremental debugging evidence that may exist even when the run did not complete.