Search documentation

Search the Fumadocs-backed documentation index.

Getting Started

Create, validate, build, run, list, and inspect AXP experiments with the current CLI.

This guide walks through creating an experiment and running it — first on the AXP platform with axp run, then locally in Docker with axp local run. Run commands from the directory where you want AXP to create experiment files (and, for local runs, write ./.axp/runs/).

Install AXP first if axp --help is not available. See Installation.

Prerequisites

axp run submits to the AXP platform, so sign in once:

axp auth login

The platform supplies model access, so you don't need your own model key for axp run. To run locally with axp local run instead, make sure Docker is available and a model key is set in your host environment — ANTHROPIC_API_KEY for the default Claude agent, or OPENAI_API_KEY if your experiment uses agent: codex. See Runs for the full platform-vs-local comparison.

1. Create an intro experiment

If the product you are trying to test is a CLI, axp intro gets you going with AXP by creating a curated four-variant experiment. It tests how well an agent can install and smoke-test your CLI with increasing levels of guidance. The resulting run shows whether the agent can figure out the right installation path on its own, and how much the added context changes success rate, cost, and tool usage. Every variant gets the same generic task description; only the install pointer changes: no pointer, CLI name, install docs URL, then exact install command.

Ask your coding agent something like:

I'm a first-time AXP user. Run an experiment with my CLI using the `axp` command.

The agent should discover your CLI name, install docs, install command, and hidden smoke-test oracle, then run a deterministic command like:

axp intro my-first-experiment \
  --cli mycli \
  --install-docs https://example.com/install \
  --install-command 'npm install mycli' \
  --smoke-command 'mycli --version' \
  --smoke-output-contains 'mycli version'
axp validate my-first-experiment.yaml
axp local run --dry-run my-first-experiment.yaml
axp local run --watch my-first-experiment.yaml

AXP writes my-first-experiment.yaml in the current directory. The kebab-case name becomes both the file name and experiment id. axp intro only creates the YAML; it does not prompt, log in, check Docker, collect model keys, or run the experiment automatically. --smoke-command and --smoke-output-contains are hidden test oracles. They are not shown to the agent in any variant. Use --smoke-output-contains when the binary name is ambiguous, so the test proves the installed program is the intended product and not just any command with the same name.

For general-purpose experiment authoring, create a scaffold instead:

axp create my-custom-experiment

Use that scaffold to define custom tasks, variants, models, setup, secrets, tests, and limits.

If you are working with a coding agent, ask it to use the bundled axp-create-experiment skill to scope and draft the experiment with you.

If you start from bundled examples in the AXP repository instead, look under examples/experiments/. Example catalogs include .yaml files in nested directories under that path.

2. Validate the YAML

axp validate my-first-experiment.yaml

Expected output:

Validated my-first-experiment.yaml
Next: run it with `axp run my-first-experiment.yaml`

On schema failure, axp validate prints a single-line error and exits non-zero.

When you're signed in (axp auth login), axp validate also prints a short, best-effort AI review of your experiment design to stderr — advisory only, so it never changes the exit code. Signed out, it notes that signing in unlocks AI hints. Skip it with --no-ai. See validate for details.

3. Run the experiment

axp run my-first-experiment.yaml

axp run validates the experiment, submits it to the AXP platform, and polls until every job finishes. The platform runs one managed sandbox per (variant × repeat) job and collects the results in your organization — open the run details page to see status, cost, tests, and the agent trace for each variant.

Useful options:

axp run --variant v1 my-first-experiment.yaml   # subset the matrix
axp run --repeat 3 my-first-experiment.yaml      # 3 jobs per variant
axp run --detach my-first-experiment.yaml        # submit without polling

axp run prints a run id. Cancel queued or in-flight jobs with it:

axp cancel <run-request-id>

Experiments that declare secrets can't be submitted to the platform yet — run those locally (below).

Run locally instead

To run in local Docker on your own machine, use axp local run:

axp local run my-first-experiment.yaml

On first run, AXP pulls 514labs/axp-base:<version> from Docker Hub automatically (a few hundred MB; subsequent runs reuse the cached image). When all variants finish, AXP prints a labeled summary block and writes a run directory under ./.axp/runs/:

Run: 01KRC8W53YD9WXNZ0GGBKMXB69
Variants:  2 (pass=2 fail=0 timeout=0 error=0 cost_cap=0)
Cost:      $0.0123 USD
Output:    ./.axp/runs/01KRC8W53YD9WXNZ0GGBKMXB69/
Artifact:  ./.axp/derived/01KRC8W53YD9WXNZ0GGBKMXB69/artifacts.parquet

Suggested next steps:
  Upload to platform-app:
    axp upload 01KRC8W53YD9WXNZ0GGBKMXB69
  Explore this run with an agent:
    axp prompt explore 01KRC8W53YD9WXNZ0GGBKMXB69 | claude

axp local run carries the flags that depend on running locally:

axp local run --dry-run my-first-experiment.yaml
axp local run --jobs 2 my-first-experiment.yaml
axp local run --watch my-first-experiment.yaml
axp local run --env-file .env --env GITHUB_TOKEN=... my-first-experiment.yaml

--dry-run validates and plans without containers or API calls. --jobs controls variant concurrency. --watch live-tails agent events. Secrets must be declared in the experiment before --env-file or --env can inject them. The run ID is a ULID; the output directory under ./.axp/runs/ is named after it.

The remaining steps — listing, inspecting, and querying runs — apply to local runs and to platform runs you've pulled down with axp download <run-id>.

4. List runs

axp list

axp list shows recent runs, newest first. Signed-in users see both local runs from ./.axp/runs/ and remote platform runs; signed-out or offline users still see local runs. Use axp list --local when you only want runs with artifact files on this machine. Interrupted local runs are marked partial.

5. Inspect result artifacts

Run artifacts are written under .axp/runs/<run-id>/. After axp list shows the run, read them directly:

.axp/runs/01KRC8W53YD9WXNZ0GGBKMXB69/index.json
.axp/runs/01KRC8W53YD9WXNZ0GGBKMXB69/variants/v1/resolved-variant.yaml
.axp/runs/01KRC8W53YD9WXNZ0GGBKMXB69/results/v1/run-summary.json
.axp/runs/01KRC8W53YD9WXNZ0GGBKMXB69/results/v1/traces/abc.otlp.json
.axp/runs/01KRC8W53YD9WXNZ0GGBKMXB69/results/v1/application/dev-server-healthy.stdout.log
.axp/runs/01KRC8W53YD9WXNZ0GGBKMXB69/results/v1/introspection/workspace-readable.stdout.log

For structured local querying use axp local query <run-id> "<SQL>". For platform runs, use axp download <run-id> to fetch the Parquet locally before querying.

Start with index.json for the run summary and results/<variant-id>/run-summary.json for each variant's status, cost, duration, agent metrics, and test results. Use variants/<variant-id>/resolved-variant.yaml for the resolved prompt, model, setup, secrets, tests, optional experiment description, and limits.

6. Query run data locally

Use a schema-first query to inspect the runs table shape before deeper analysis:

axp local query 01KRC8W53YD9WXNZ0GGBKMXB69 "DESCRIBE runs" --table

axp local query runs SQL over local logical tables derived from .axp/runs/<run-id>/, or over downloaded Parquet in .axp/downloads/<run-id>/: runs, agent_events, tool_calls, messages, harness_spans, tests, artifacts, and experiment_runs. Default output is NDJSON; pass --table for a compact terminal view. Use axp download <run-id> first when the run is only available on the platform.