Experiment YAML | AXP Documentation

Experiment YAML is the file format passed to axp run and axp local run.

The current parser accepts schema_version: 2 and rejects unknown fields at every level. For the canonical machine-readable schema, use experiment.v2.schema.yaml.

Schema URL

Use the version-pinned schema URL for editor validation:

# yaml-language-server: $schema=https://docs.514.ai/schema/experiment.v2.schema.yaml

Published schema URLs:

URL	Use
`https://docs.514.ai/schema/experiment.schema.yaml`	Latest supported schema
`https://docs.514.ai/schema/experiment.v2.schema.yaml`	Version-pinned v2 schema
`https://docs.514.ai/schema/experiment.v1.schema.yaml`	Legacy v1 schema. v1 experiments no longer run; this remains available so older files that pin it still validate in-editor while migrating to v2.

axp experiment schema prints the latest supported schema to stdout.

Field index

Object	Fields
Top-level experiment	`schema_version`, `id`, `name`, `description`, `agents`, `prompts`, `environments`, `products`, `extensions`, `environment_variables`, `secrets`, `files`, `tests`, `limits`
Agent	`name`, `model`
Model	`name`, `effort`, `context_window_size`, `thinking`, `fast`
Prompt	`id`, `prompt`, `description`, `tags`
Environment	`name`, `setup`, `description`, `tags`, `commit`
Product	`name`, `type`, `setup`, `version`, `commit`, `description`, `tags`
Setup	`name`, `script`, `description`, `tags`, `files`, `environment_variables`, `secrets`, `mcp_servers`, `setup_checks`
Extension	`id`, `description`, `tags`, `agents`, `prompts`, `environments`, `products`, `extensions`
Environment variable	`name`, `value`
File entry	`name`, `source`, `sha256`, `dest`
Test	`name`, `script`
Setup check	`name`, `script`
MCP server	`name`, `type`, `command`, `args`, `url`, `env`, `headers`
MCP stdio env entry	`name`, `from`
MCP HTTP and SSE header	`name`, `value`
Limits	`max_turns`, `max_time_seconds`, `max_cost_usd`

Minimal example

# yaml-language-server: $schema=https://docs.514.ai/schema/experiment.v2.schema.yaml
schema_version: 2
id: cli-install
name: "CLI install"

agents:
  - name: claude
    model: anthropic/claude-sonnet-4.6

prompts:
  - id: install
    prompt: |
      Install the CLI and write its version to /workspace/version.txt.

products:
  - name: cli
    type: CLI
    setup: npm pack

tests:
  application:
    - name: version-file-exists
      script: test -f /workspace/version.txt

limits:
  max_turns: 25
  max_time_seconds: 300
  max_cost_usd: 0.50

Top-level experiment

An experiment defines what you want to learn about agents using a product surface. It names the agents, prompts, products, environments, and tests AXP uses to compute and run variants.

Field	Required	Type / shape	Notes
`schema_version`	Yes	Integer	Must be `2`.
`id`	Yes	Kebab-case string	Stable experiment id. Should match the YAML file name without `.yaml`.
`name`	Yes	String	Human-readable experiment name.
`description`	No	String	Optional context used when analyzing outcomes.
`agents`	No	Agent axis	Optional at the top level only if an extension supplies agents. Every resolved variant must have an agent.
`prompts`	No	Prompt axis	Optional at the top level only if an extension supplies prompts. Every resolved variant must have a non-empty prompt.
`environments`	No	Environment axis	Optional.
`products`	No	Product axis	Optional.
`extensions`	No	Extension list	Optional. If present, only extension-derived variants are created.
`environment_variables`	No	Environment variable list	`{name, value}` entries injected into every variant. Values can be literals or `axp://secrets/<slug>` references resolved from your org secret store.
`secrets`	No	String list	Deprecated alias for host environment-variable names. Prefer `environment_variables`.
`files`	No	File entry list	Host files staged into every variant before setup runs.
`tests`	Yes	Tests object	Must contain at least one application or introspection test.
`limits`	Yes	Limits object	Run caps.

Top-level environment_variables, secrets, and files apply to every variant. MCP servers and setup checks are setup-owned fields.

Variant axes

Variant axes are the experiment inputs AXP combines into runnable variants. This lets you define the dimensions you care about once, then compare results by agent, prompt, environment, and product.

variants = agents × prompts × environments × products

environments and products are optional. If an axis is absent, it is omitted from the variant coordinate.

Agents

agents defines which coding agents run the experiment. Each agent can use its default model or pin an explicit model and model controls.

Accepted shapes:

# String form: one agent, provider-default model.
agents: claude

# List of strings: multiple agents, provider-default models.
agents:
  - claude
  - codex

# Object form: one agent with an explicit model.
agents:
  name: claude
  model: anthropic/claude-sonnet-4.6

# List of objects: multiple explicit agent/model pairs.
agents:
  - name: claude
    model: anthropic/claude-sonnet-4.6

model is optional but recommended. If omitted, AXP uses the provider-default model for that agent.

Agent object

Field	Required	Type / values	Notes
`name`	Yes	`claude`, `codex`, or `cursor`	Coding agent to run.
`model`	No	Model id string or model object	Bare agent names use provider defaults.

Model object

A model object configures the model used by an agent. Optional controls are passed through when the selected agent supports them.

Field	Required	Type / values	Notes
`name`	Yes	String	Provider/model id, such as `anthropic/claude-opus-4.8` or `openai/gpt-5`.
`effort`	No	`low`, `medium`, `high`, `x-high`, `max`	Reasoning effort control.
`context_window_size`	No	String	Context window hint, such as `1M` or `200k`.
`thinking`	No	Boolean	Enables thinking mode when supported.
`fast`	No	Boolean	Enables fast mode when supported.

agents:
  - name: claude
    model:
      name: anthropic/claude-opus-4.8
      effort: high
      context_window_size: 1M
      thinking: true
  - name: codex
    model: openai/gpt-5

Prompts

prompts defines the tasks agents try to complete. Each resolved variant receives one final prompt.

Accepted shapes:

# String form: one prompt.
prompts: "Build the report."

# List of strings: multiple prompts.
prompts:
  - "Build the report."
  - "Build the report and explain your steps."

# Object form: list of named prompts.
prompts:
  - id: detailed
    prompt: "Build the report and explain your steps."

Prompt object

A prompt object gives a task stable metadata, so Results can filter and group by the prompt that produced each run.

Field	Required	Type / values	Notes
`id`	Yes	Kebab-case string	Stable prompt id. Recorded for filtering in Results.
`prompt`	Yes	String	Task text given to the agent.
`description`	No	String	Human-readable notes.
`tags`	No	String list	Free-form labels.

Bare prompt strings get positional ids: p0, p1, and so on.

environments defines the sandbox conditions around the agent, such as installed tools, fixtures, or external integrations. Use environments when you want to compare how agents perform under different surrounding conditions.

Accepted shapes:

# String form: one setup script with a generated environment name.
environments: "pip install -r requirements.txt"

# Object form: one named environment.
environments:
  name: workspace
  setup: "pip install -r requirements.txt"

# List form: multiple named environments.
environments:
  - name: workspace
    setup: "pip install -r requirements.txt"

Environment object

An environment object names one sandbox setup condition. Its setup prepares the sandbox before the agent receives the prompt.

Field	Required	Type / values	Notes
`name`	Yes	Kebab-case string	Environment coordinate. Recorded for filtering in Results.
`setup`	Yes	Setup	Prepares the sandbox before the agent runs.
`description`	No	String	Human-readable notes.
`tags`	No	String list	Free-form labels.
`commit`	No	String	Source commit of the environment under test.

Bare environment strings are shorthand for an environment whose setup is that string.

Products

products defines the agent-facing surface under test. A product can be a CLI, API, MCP server, SDK, docs surface, or other tool the agent uses to complete the prompt.

Accepted shapes:

# String form: one setup script with a generated product name.
products: "npm install -g my-cli"

# Object form: one named product.
products:
  name: my-cli
  type: CLI
  version: "1.2.0"
  setup: "npm install -g my-cli"

# List form: multiple named products.
products:
  - name: my-cli
    type: CLI
    version: "1.2.0"
    setup: "npm install -g my-cli"

Product object

A product object names what you want to compare. Product metadata is recorded so Results can filter and group by product and product version.

Field	Required	Type / values	Notes
`name`	Yes	Kebab-case string	Product coordinate. Recorded for filtering in Results.
`type`	No	Product type	Defaults to `Other`.
`setup`	Yes	Setup	Prepares the product before the agent runs.
`version`	No	String	Product version. Quote numeric versions, such as `"25.3"`.
`commit`	No	String	Source commit of the product under test.
`description`	No	String	Human-readable notes.
`tags`	No	String list	Free-form labels.

Product type

Allowed values: CLI, MCP, API, Skill, SDK, Schema, Docs, Marketing, Agents.md, Other.

Setup

setup prepares the sandbox for an environment or product. Use setup scripts to install tools, create fixtures, start services, or expose resources the agent needs.

Each variant has its own isolated /workspace. A setup script cannot rely on files or side effects created by another variant.

Accepted shapes:

# String form: one setup script.
setup: "npm install"

# List of strings: multiple setup scripts, run in order.
setup:
  - "npm install"
  - "npm test -- --help"

# Object form: one named setup with optional scoped resources.
setup:
  name: install-cli
  script: "npm install"

# List of objects: multiple named setups, run in order.
setup:
  - name: install-cli
    script: "npm install"
  - name: smoke-cli
    script: "npm test -- --help"

# Mixed list: strings and setup objects can be combined.
setup:
  - "npm install"
  - name: smoke-cli
    script: "npm test -- --help"

Setup object

A setup object is the named form of setup. Use it when setup needs its own files, environment variables, MCP servers, or setup checks.

Field	Required	Type / values	Notes
`name`	Yes	Kebab-case string	Required for object form.
`script`	Yes	String	Bash run before setup checks and before the agent. Secret values are not injected here.
`description`	No	String	Human-readable notes.
`tags`	No	String list	Free-form labels.
`files`	No	File entry list	Accepted on setup objects, but current runs only stage top-level `files`. Use top-level `files` when the run needs host files delivered.
`environment_variables`	No	Environment variable list	Runtime env vars scoped to variants that use this setup.
`secrets`	No	String list	Deprecated.
`mcp_servers`	No	MCP server list	MCP servers exposed to the agent.
`setup_checks`	No	Setup check list	Checks run after setup and before the agent.

When both a product and environment contribute setup, product setup runs before environment setup.

Setup checks

setup_checks verify that setup produced a usable sandbox before the agent starts. A failed setup check stops that variant before any agent work happens.

setup_checks:
  - name: cli-on-path
    script: my-cli --version

Setup check object

Field	Required	Type / values	Notes
`name`	Yes	Kebab-case string	Appears in `setup-checks/<name>.json`.
`script`	Yes	String	Bash check. A non-zero exit aborts the variant before the agent runs.

MCP servers

mcp_servers exposes MCP servers to the agent for variants that use this setup. Use this field when the product surface or environment includes an MCP tool.

mcp_servers:
  - name: fixture-sentinel
    type: stdio
    command: /workspace/fixture-mcp.py
    args: []
  - name: axp
    type: http
    url: http://localhost:3001/mcp

MCP server object

An MCP server object defines one server and its transport. The required connection fields depend on type.

Field	Required	Type / values	Notes
`name`	Yes	String	Must be unique within the setup's `mcp_servers`.
`type`	Yes	`stdio`, `http`, or `sse`	Transport.
`command`	For `stdio`	String	Executable path or command inside the sandbox.
`args`	No	String list	Stdio command arguments.
`url`	For `http` / `sse`	String	MCP endpoint URL.
`env`	No	MCP stdio env entries	Only valid for `stdio`.
`headers`	No	MCP header entries	Only valid for `http` / `sse`.

Transport-specific rules:

stdio uses command, optional args, and optional env.
http and sse use url and optional headers.
Mixing stdio-only and endpoint-only fields is rejected.

MCP stdio env entry

env forwards declared secret values to a stdio MCP process. Each entry is either a bare secret name or an object:

env:
  - GITHUB_TOKEN
  - name: GH_AUTH
    from: GITHUB_TOKEN

Field	Required	Type / values	Notes
`name`	Yes	String	Env var name as seen by the MCP server process.
`from`	Yes	Secret name	Declared secret name whose value is forwarded.

MCP HTTP and SSE header object

headers attaches HTTP headers to an http or sse MCP server. Use placeholders when a header needs a declared secret value.

headers:
  - name: Authorization
    value: "Bearer ${SUPABASE_SERVICE_ROLE_KEY}"

Field	Required	Type / values	Notes
`name`	Yes	String	HTTP header name. Header names must be unique per server, case-insensitively.
`value`	Yes	String	May contain `${SECRET_NAME}` placeholders. Bare `$NAME` is literal.

MCP env entries and header placeholders reference environment variable names visible to the variant. Values can come from literal environment_variables, axp://secrets/<slug> references, or the deprecated secrets list.

Rules enforced at axp experiment validate time:

Every stdio env[*].from and every ${NAME} placeholder in a header value must reference a name visible to the variant.
Bare $NAME is treated as a literal; only ${NAME} is a placeholder.
HTTP header names are unique per server, case-insensitively.
command / args / env are only valid for type: stdio; url / headers are only valid for type: http / sse.

Resolved secret values are written into the agent session frame and can appear in run artifacts. Treat artifacts as sensitive whenever an experiment forwards secrets to MCP servers.

Extensions

extensions refine the variant set when the base axes do not express the combinations you need. Use extensions to narrow an axis, swap products or environments for a subtree, or append prompt guidance to a slice of variants.

If an experiment declares any extensions, AXP creates only extension-derived variants.

Extension object

An extension object is one node in the refinement tree. Nested extensions are recursively cross-multiplied with their parents.

Field	Required	Type / values	Notes
`id`	Yes	Kebab-case string	Must be unique among sibling extensions.
`description`	No	String	Human-readable notes.
`tags`	No	String list	Added to resolved variant tags.
`agents`	No	Agent list	Replaces inherited agents for this extension subtree.
`prompts`	No	Prompt list	Appended to inherited prompt text. Does not replace the prompt axis.
`environments`	No	Environment list	Replaces inherited environments for this extension subtree.
`products`	No	Product list	Replaces inherited products for this extension subtree.
`extensions`	No	Extension list	Nested extensions.

prompts:
  - id: analyze
    prompt: "Read /workspace/task.md and write /workspace/report.json."

extensions:
  - id: with-cli
    products:
      - name: cli
        type: CLI
        setup: "curl -fsSL https://clickhouse.com/ | sh"
    prompts: ["Use the ClickHouse CLI for the analysis."]
  - id: without-cli
    products:
      - name: no-cli
        setup: "true"
    prompts: ["Do not use the ClickHouse CLI; use another local method."]

Environment variables

environment_variables injects environment variables into variants at runtime. Use it for non-secret runtime configuration and supported secret references.

environment_variables:
  - name: LOG_LEVEL
    value: debug
  - name: GITHUB_TOKEN
    value: axp://secrets/prod-gh
  - name: DATABASE_URL
    value: axp://secrets/staging-db

Environment variable object

Field	Required	Type / values	Notes
`name`	Yes	Env-var name	Must match `^[A-Z_][A-Z0-9_]*$`. Harness-reserved names and prefixes are rejected.
`value`	Yes	String	Literal value or `axp://secrets/<slug>` reference. Other values beginning with `axp://` are rejected.

Store secret values once with axp secrets set <slug>, then reference them with axp://secrets/<slug>. Both axp run and axp local run resolve these references before injecting env vars into the sandbox.

A referenced slug that does not exist in your org fails at preflight.

Reserved names:

ANTHROPIC_API_KEY
ANTHROPIC_BASE_URL
OPENAI_API_KEY
OPENAI_BASE_URL
CURSOR_API_KEY
MODEL
MAX_TURNS
IS_SANDBOX
TRACEPARENT
any name beginning with AXP_, CLAUDE_CODE_, CODEX_, CURSOR_, or OTEL_

Files

Top-level files stages host files or directories into every variant's /workspace before setup runs.

setup.files is accepted in experiment YAML, but current runs only stage top-level files. Put host file staging at the top level when you need the files delivered during a run.

files:
  - name: my-cli
    source: ../build/mycli
    dest: tools/mycli
  - source: https://example.com/fixtures/data.bin
    sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
    dest: fixtures/data.bin

File entry object

Field	Required	Type / values	Notes
`name`	Sometimes	Kebab-case string	Required when `source` is omitted. Used as the `--file NAME=SOURCE` handle.
`source`	Sometimes	Host path or `http(s)` URL	Required when `name` is omitted. Relative paths resolve against the YAML file's directory.
`sha256`	No	64-char hex string	Valid for file sources and URL downloads. Invalid for directory sources.
`dest`	Yes	Workspace-relative path	Absolute paths, `..`, `.axp-bridge`, and `::` are rejected.

Notes:

Directory sources copy their contents under dest/.
File sources land as a single file at dest.
URL sources must be publicly fetchable and land as a single file at dest; unpack archives in setup if needed.
Sandboxes are Linux containers; macOS binaries staged from a laptop will not run there.
Directory walks honor .axpignore, not .gitignore.
--file NAME=SOURCE binds or overrides the source of a named entry.
--file SOURCE::DEST stages an ad-hoc entry into every variant.
Missing or unbound sources abort real runs at preflight. --resolve-variants renders MISSING / UNBOUND annotations without failing.
Staging failures roll the variant up as status=error / exit_reason=staging_failed; the rest of the run keeps going.
Platform and local runs both stage top-level files.
Treat experiment YAML like a script you run: sources are read with your permissions and may point anywhere on the host.

Tests

tests defines how AXP scores each run. Application tests check the output state; introspection tests check how the agent got there.

tests:
  application:
    - name: report-exists
      script: test -f /workspace/report.json
  introspection:
    - name: under-thirty-tool-calls
      script: '[ "$(jq ".tool_calls | length" "$AXP_TRACE_PATH")" -lt 30 ]'

Field	Required	Type / values	Notes
`application`	No	Test list	Checks resulting application state, files, commands, or endpoints.
`introspection`	No	Test list	Checks agent behavior through trace artifacts such as `AXP_TRACE_PATH`.

Test object

Field	Required	Type / values	Notes
`name`	Yes	Kebab-case string	Must be globally unique across application and introspection tests.
`script`	Yes	String	Bash script. Streamed over stdin and not shown to the agent.

Limits

limits sets the run caps for each variant execution.

Limits object

Field	Required	Type / values	Notes
`max_turns`	Yes	Integer greater than `0`	Agent turn cap.
`max_time_seconds`	Yes	Integer greater than `0`	Wall-clock timeout in seconds.
`max_cost_usd`	Yes	Number greater than `0`	Enforced when the agent reports cumulative cost during the run; if no cost is reported, AXP cannot stop on cost.

Secrets deprecated

secrets is a deprecated alias for host environment variable names injected into variants at runtime.

Prefer environment_variables for literal values. Some MCP secret-forwarding fields still reference declared secret names.

secrets:
  - GITHUB_TOKEN

Secret names must match ^[A-Z_][A-Z0-9_]*$.

Validation rules

An experiment is invalid if:

the YAML contains a field not defined by the schema
schema_version is not 2
any required field is missing
an id that must be kebab-case is not kebab-case
an agent model id contains ::
a declared axis is empty
duplicate ids or names appear where uniqueness is required
the resolved variant set is empty
a resolved variant has an empty prompt
two resolved variants collide on variant_id
no tests are defined
test names are duplicated
an env var or secret name is invalid or reserved
a file entry has an invalid name, source, sha256, or dest
an MCP server mixes transport-specific fields
an MCP server references a secret not visible to the variant
any limit is not greater than zero

YAML syntax boundaries

The experiment data model is JSON-compatible even though the authoring file is YAML.

YAML comments are allowed.
YAML anchors and aliases are allowed when the resolved value is JSON-compatible.
Custom YAML tags are unsupported.
Non-string mapping keys are unsupported.

Search documentation