Getting Started
Create, validate, build, run, list, and inspect AXP experiments with the current CLI.
This guide walks through creating an experiment and running it — first on the
AXP platform with axp run, then locally in Docker with axp local run. Run
commands from the directory where you want AXP to create experiment files (and,
for local runs, write ./.axp/runs/).
Install AXP first if axp --help is not available. See
Installation.
Prerequisites
axp run submits to the AXP platform, so sign in once:
axp auth loginThe platform supplies model access, so you don't need your own model key for
axp run. To run locally with axp local run instead, make sure Docker is
available and a model key is set in your host environment — ANTHROPIC_API_KEY
for the default Claude agent, or OPENAI_API_KEY if your experiment uses
agent: codex. See Runs for the full
platform-vs-local comparison.
1. Create an intro experiment
If the product you are trying to test is a CLI, axp intro gets you going with
AXP by creating a curated four-variant experiment.
It tests how well an agent can install and smoke-test your CLI with increasing
levels of guidance.
The resulting run shows whether the agent can figure out the right installation
path on its own, and how much the added context changes success rate, cost, and
tool usage.
Every variant gets the same generic task description; only the install pointer
changes: no pointer, CLI name, install docs URL, then exact install command.
Ask your coding agent something like:
I'm a first-time AXP user. Run an experiment with my CLI using the `axp` command.The agent should discover your CLI name, install docs, install command, and hidden smoke-test oracle, then run a deterministic command like:
axp intro my-first-experiment \
--cli mycli \
--install-docs https://example.com/install \
--install-command 'npm install mycli' \
--smoke-command 'mycli --version' \
--smoke-output-contains 'mycli version'
axp validate my-first-experiment.yaml
axp local run --dry-run my-first-experiment.yaml
axp local run --watch my-first-experiment.yamlAXP writes my-first-experiment.yaml in the current directory. The kebab-case
name becomes both the file name and experiment id. axp intro only creates
the YAML; it does not prompt, log in, check Docker, collect model keys, or run
the experiment automatically.
--smoke-command and --smoke-output-contains are hidden test oracles. They
are not shown to the agent in any variant. Use --smoke-output-contains when
the binary name is ambiguous, so the test proves the installed program is the
intended product and not just any command with the same name.
For general-purpose experiment authoring, create a scaffold instead:
axp create my-custom-experimentUse that scaffold to define custom tasks, variants, models, setup, secrets, tests, and limits.
If you are working with a coding agent, ask it to use the bundled
axp-create-experiment skill to scope and draft the experiment with you.
If you start from bundled examples in the AXP repository instead, look under
examples/experiments/. Example catalogs include .yaml files in nested
directories under that path.
2. Validate the YAML
axp validate my-first-experiment.yamlExpected output:
Validated my-first-experiment.yaml
Next: run it with `axp run my-first-experiment.yaml`On schema failure, axp validate prints a single-line error and exits non-zero.
When you're signed in (axp auth login), axp validate also prints a short,
best-effort AI review of your experiment design to stderr — advisory only,
so it never changes the exit code. Signed out, it notes that signing in unlocks
AI hints. Skip it with --no-ai. See validate
for details.
3. Run the experiment
axp run my-first-experiment.yamlaxp run validates the experiment, submits it to the AXP platform, and polls
until every job finishes. The platform runs one managed sandbox per
(variant × repeat) job and collects the results in your organization — open the
run details page to see status, cost, tests, and the agent trace for each
variant.
Useful options:
axp run --variant v1 my-first-experiment.yaml # subset the matrix
axp run --repeat 3 my-first-experiment.yaml # 3 jobs per variant
axp run --detach my-first-experiment.yaml # submit without pollingaxp run prints a run id. Cancel queued or in-flight jobs with it:
axp cancel <run-request-id>Experiments that declare secrets can't be submitted to the platform yet — run
those locally (below).
Run locally instead
To run in local Docker on your own machine, use axp local run:
axp local run my-first-experiment.yamlOn first run, AXP pulls 514labs/axp-base:<version> from Docker Hub
automatically (a few hundred MB; subsequent runs reuse the cached image). When
all variants finish, AXP prints a labeled summary block and writes a run
directory under ./.axp/runs/:
Run: 01KRC8W53YD9WXNZ0GGBKMXB69
Variants: 2 (pass=2 fail=0 timeout=0 error=0 cost_cap=0)
Cost: $0.0123 USD
Output: ./.axp/runs/01KRC8W53YD9WXNZ0GGBKMXB69/
Artifact: ./.axp/derived/01KRC8W53YD9WXNZ0GGBKMXB69/artifacts.parquet
Suggested next steps:
Upload to platform-app:
axp upload 01KRC8W53YD9WXNZ0GGBKMXB69
Explore this run with an agent:
axp prompt explore 01KRC8W53YD9WXNZ0GGBKMXB69 | claudeaxp local run carries the flags that depend on running locally:
axp local run --dry-run my-first-experiment.yaml
axp local run --jobs 2 my-first-experiment.yaml
axp local run --watch my-first-experiment.yaml
axp local run --env-file .env --env GITHUB_TOKEN=... my-first-experiment.yaml--dry-run validates and plans without containers or API calls. --jobs
controls variant concurrency. --watch live-tails agent events. Secrets must be
declared in the experiment before --env-file or --env can inject them. The
run ID is a ULID; the output directory under
./.axp/runs/ is named after it.
The remaining steps — listing, inspecting, and querying runs — apply to local
runs and to platform runs you've pulled down with axp download <run-id>.
4. List runs
axp listaxp list shows recent runs, newest first. Signed-in users see both local runs
from ./.axp/runs/ and remote platform runs; signed-out or offline
users still see local runs. Use axp list --local when you only want runs with
artifact files on this machine. Interrupted local runs are marked partial.
5. Inspect result artifacts
Run artifacts are written under .axp/runs/<run-id>/. After axp list shows the run, read them directly:
.axp/runs/01KRC8W53YD9WXNZ0GGBKMXB69/index.json
.axp/runs/01KRC8W53YD9WXNZ0GGBKMXB69/variants/v1/resolved-variant.yaml
.axp/runs/01KRC8W53YD9WXNZ0GGBKMXB69/results/v1/run-summary.json
.axp/runs/01KRC8W53YD9WXNZ0GGBKMXB69/results/v1/traces/abc.otlp.json
.axp/runs/01KRC8W53YD9WXNZ0GGBKMXB69/results/v1/application/dev-server-healthy.stdout.log
.axp/runs/01KRC8W53YD9WXNZ0GGBKMXB69/results/v1/introspection/workspace-readable.stdout.logFor structured local querying use axp local query <run-id> "<SQL>". For platform runs, use axp download <run-id> to fetch the Parquet locally before querying.
Start with index.json for the run summary and
results/<variant-id>/run-summary.json for each variant's status, cost,
duration, agent metrics, and test results. Use
variants/<variant-id>/resolved-variant.yaml for the resolved prompt, model,
setup, secrets, tests, optional experiment description, and limits.
6. Query run data locally
Use a schema-first query to inspect the runs table shape before deeper analysis:
axp local query 01KRC8W53YD9WXNZ0GGBKMXB69 "DESCRIBE runs" --tableaxp local query runs SQL over local logical tables derived from .axp/runs/<run-id>/, or over downloaded Parquet in .axp/downloads/<run-id>/:
runs, agent_events, tool_calls, messages, harness_spans, tests, artifacts, and experiment_runs.
Default output is NDJSON; pass --table for a compact terminal view. Use axp download <run-id> first when the run is only available on the platform.