Runs and artifacts
How axp stores runs, variant artifacts, summaries, traces, logs, and partial runs under .axp/runs.
axp local run <experiment>.yaml writes local artifacts under the current working directory:
./.axp/runs/<run-id>/This layout is what a local run produces. Platform runs (axp run) store their results in your organization; pull one down with axp download <run-id> to get the same files under ./.axp/downloads/<run-id>/ and read or query them locally.
A run is one real run iteration. A normal axp local run creates one run; axp local run --repeat <N> creates one run per repeat. The run id is the experiment id plus a ULID:
<experiment-id>-<26-character-ulid>The ULID carries the run timestamp. axp list orders local runs by the recorded start time when available, falling back to directory modified time.
Run Layout
A completed run has this top-level shape:
.axp/runs/<run-id>/
index.json
variants/
<variant-id>/
resolved-variant.yaml
staging.json
results/
<variant-id>/
layout.json
run-summary.json
agent-raw.json
agent.log
agent-session.jsonl
workspace/
application/
<test-name>.stdout.log
<test-name>.stderr.log
introspection/
<test-name>.stdout.log
<test-name>.stderr.log
traces/
<trace-id>.otlp.json
<trace-id>.agent.otlp.json
<trace-id>.agent.logs.json
api-bodies/
...
commands.jsonl
commands/
<command-id>.stdout
<command-id>.stderr
fs-diff/
<seq>.json
<seq>/
<flattened-path>.patchOnly variants selected for the run are written. If you run axp local run --variant a --variant b, the run contains variants/a/resolved-variant.yaml, variants/b/resolved-variant.yaml, results/a/, and results/b/, not every variant from the experiment matrix.
axp local run --dry-run is different: it renders the plan and returns without writing a run under your project.
Resolved Variant YAML
variants/<variant-id>/resolved-variant.yaml is the resolved experiment for that variant. It is written before containers start, after filtering by --variant.
Use this file when you need to see the exact prompt, model, test set, setup, optional experiment_description, and resolved matrix values the harness used for a variant.
staging.json
variants/<variant-id>/staging.json is written when the experiment stages files into the sandbox via files or axp local run --file. One record per staged entry, in delivery order: {name?, source, dest, file_count, skipped_symlinks, size_bytes, tar_sha256}. For a host-path source source is the resolved path; for an http(s) URL source it is the URL. On a staging_failed variant the failing entry's record carries an error field and later entries are absent. Unlike resolved-variant.yaml, this reflects what actually staged — including --file source binds and ad-hoc entries.
index.json
index.json is the run-level index. It is written after all selected variants finish and their run-summary.json files have been collected.
Its current schema is:
{
"schema_version": 1,
"run_id": "hello-01HV...",
"experiment_id": "hello",
"variants": {
"baseline": {
"variant_tag": "Baseline",
"status": "pass",
"cost_usd": 0.0123,
"duration_seconds": 42,
"paths": {
"results_dir": "results/baseline",
"run_summary": "results/baseline/run-summary.json"
}
}
}
}The variants object is keyed by variant id. Each entry contains the human-facing variant tag, rolled-up status, agent cost, duration, and relative paths to the variant result directory and summary.
If index.json is missing or malformed, the run is treated as partial by read-side commands.
run-summary.json
Each variant writes results/<variant-id>/run-summary.json at the end of the variant run. The summary is the compact, structured result for that variant:
{
"schema_version": 1,
"run_id": "hello-01HV...",
"variant_id": "baseline",
"variant_tag": "Baseline",
"experiment_id": "hello",
"status": "pass",
"agent": {
"exit_code": 0,
"turns": 4,
"cost_usd": 0.012,
"input_tokens": 5821,
"output_tokens": 412
},
"started_at_epoch": 1714408800,
"ended_at_epoch": 1714408842,
"duration_seconds": 42,
"tests": [
{
"name": "file-exists",
"kind": "application",
"status": "pass",
"exit_code": 0,
"duration_seconds": 1,
"stdout_tail": "",
"stderr_tail": ""
}
]
}Statuses are lowercase JSON strings:
pass: there was no timeout and every executed test passed. If tests ran and all passed, a non-zero agent exit code does not by itself change the rollup toerror.fail: at least one test failed.timeout: the agent hitmax_time_seconds; tests are not run for that variant.error: the agent exited non-zero and no tests ran.
The stdout_tail and stderr_tail fields keep only the last 8 KB of each test log. Full logs stay on disk under application/ or introspection/.
Test Logs
Application and introspection test logs are separated so tests with the same name in different test kinds do not clobber each other:
results/<variant-id>/application/<test-name>.stdout.log
results/<variant-id>/application/<test-name>.stderr.log
results/<variant-id>/introspection/<test-name>.stdout.log
results/<variant-id>/introspection/<test-name>.stderr.logThe summary records each test's kind, status, exit code, duration, and log tails. The files above contain the full stdout and stderr captures.
Agent Output
The agent invocation writes:
results/<variant-id>/agent-raw.json
results/<variant-id>/agent.log
results/<variant-id>/agent-session.jsonlagent-raw.json is the agent stdout from the claude --output-format json invocation. The harness parses cost, turns, and token usage from this JSON when available.
agent.log is the agent stderr stream.
agent-session.jsonl is copied from Claude Code's per-session JSONL when the session id is present and the file can be found in the container. This copy is best effort; a missing session file does not fail the run.
Traces
Each variant has a traces/ directory. Harness spans are written as:
results/<variant-id>/traces/<trace-id>.otlp.jsonWhen the agent-side OTLP receiver starts successfully, Claude Code telemetry for the same invocation can also appear beside it:
results/<variant-id>/traces/<trace-id>.agent.otlp.json
results/<variant-id>/traces/<trace-id>.agent.logs.jsonAgent raw API request and response bodies are written in file mode under:
results/<variant-id>/api-bodies/The harness continues the run if the agent-side receiver cannot start; in that case, the agent trace and agent log files may be absent.
Command Logs
Every host-driven docker exec goes through the command logger. The per-variant command index lives at:
results/<variant-id>/commands.jsonlFull stdout and stderr for each logged command live under:
results/<variant-id>/commands/<command-id>.stdout
results/<variant-id>/commands/<command-id>.stderrThe command records cross-reference the command captures with the run's trace spans. Use these files when you need more detail than the compact run-summary.json tails.
File Diffs
The harness attempts file-system snapshots at three points:
before_agent: before the agent does work.after_agent: after the agent exits or times out.after_tests: after tests complete.
Diff metadata is written under:
results/<variant-id>/fs-diff/<seq>.jsonPer-file unified diff patches are written under a sequence subdirectory:
results/<variant-id>/fs-diff/<seq>/<flattened-path>.patchPath separators in workspace-relative file names are flattened to underscores for patch file names. For example, src/model.py becomes src_model.py.patch.
File-diff capture is best effort. If initialization or a snapshot fails, the harness logs a warning and continues the run.
axp list
axp list prints runs newest first. By default, it combines local runs from ./.axp/runs/ with remote platform runs. Use axp list --local to scan only local artifacts, or axp list --remote to show only platform runs.
Rows include:
- run id
- experiment id
- status
- type (
localorremote) - created time
Filter with --status <STATUS>. Add --json when you need machine-readable output. For local partial runs, axp list still prints a row with status partial when it can identify the run directory.
Inspecting run artifacts
Local run artifacts live on disk under .axp/runs/<run-id>/. Use axp list --local to enumerate local runs, then read the files directly:
.axp/runs/<run-id>/index.json.axp/runs/<run-id>/variants/<variant-id>/resolved-variant.yaml.axp/runs/<run-id>/results/<variant-id>/run-summary.json.axp/runs/<run-id>/results/<variant-id>/traces/*.otlp.json.axp/runs/<run-id>/results/<variant-id>/application/*.log.axp/runs/<run-id>/results/<variant-id>/introspection/*.log
For structured local querying, use axp local query <run-id> "<SQL>". After uploading a run, use axp download <run-id> to fetch the platform Parquet locally and query it the same way.
Partial Runs
A partial run is any run whose index.json cannot be read. Common causes include interruption before the final index write or a harness failure before all variant summaries are collected.
Partial runs may still contain useful artifacts:
- resolved variant YAMLs written before execution
layout.jsonfor a started variant- agent stdout/stderr if the agent invocation completed
- test logs for tests that ran
- trace files, command logs, API bodies, and file diffs written before interruption
Treat index.json as the completion marker for a run. Treat per-variant artifacts as incremental debugging evidence that may exist even when the run did not complete.