Experiment YAML

Lookup reference for the AXP experiment YAML schema: fields, object shapes, shorthand forms, constraints, and validation rules.

Experiment YAML is the file format passed to axp run and axp local run.

The current parser accepts schema_version: 2 and rejects unknown fields at every level. For the canonical machine-readable schema, use experiment.v2.schema.yaml.

Schema URL

Use the version-pinned schema URL for editor validation:

# yaml-language-server: $schema=https://docs.514.ai/schema/experiment.v2.schema.yaml

Published schema URLs:

URLUse
https://docs.514.ai/schema/experiment.schema.yamlLatest supported schema
https://docs.514.ai/schema/experiment.v2.schema.yamlVersion-pinned v2 schema
https://docs.514.ai/schema/experiment.v1.schema.yamlLegacy v1 schema. v1 experiments no longer run; this remains available so older files that pin it still validate in-editor while migrating to v2.

axp experiment schema prints the latest supported schema to stdout.

Field index

ObjectFields
Top-level experimentschema_version, id, name, description, agents, prompts, environments, products, extensions, environment_variables, secrets, files, tests, limits
Agentname, model
Modelname, effort, context_window_size, thinking, fast
Promptid, prompt, description, tags
Environmentname, setup, description, tags, commit
Productname, type, setup, version, commit, description, tags
Setupname, script, description, tags, files, environment_variables, secrets, mcp_servers, setup_checks
Extensionid, description, tags, agents, prompts, environments, products, extensions
Environment variablename, value
File entryname, source, sha256, dest
Testname, script
Setup checkname, script
MCP servername, type, command, args, url, env, headers
MCP stdio env entryname, from
MCP HTTP and SSE headername, value
Limitsmax_turns, max_time_seconds, max_cost_usd

Minimal example

# yaml-language-server: $schema=https://docs.514.ai/schema/experiment.v2.schema.yaml
schema_version: 2
id: cli-install
name: "CLI install"

agents:
  - name: claude
    model: anthropic/claude-sonnet-4.6

prompts:
  - id: install
    prompt: |
      Install the CLI and write its version to /workspace/version.txt.

products:
  - name: cli
    type: CLI
    setup: npm pack

tests:
  application:
    - name: version-file-exists
      script: test -f /workspace/version.txt

limits:
  max_turns: 25
  max_time_seconds: 300
  max_cost_usd: 0.50

Top-level experiment

An experiment defines what you want to learn about agents using a product surface. It names the agents, prompts, products, environments, and tests AXP uses to compute and run variants.

FieldRequiredType / shapeNotes
schema_versionYesIntegerMust be 2.
idYesKebab-case stringStable experiment id. Should match the YAML file name without .yaml.
nameYesStringHuman-readable experiment name.
descriptionNoStringOptional context used when analyzing outcomes.
agentsNoAgent axisOptional at the top level only if an extension supplies agents. Every resolved variant must have an agent.
promptsNoPrompt axisOptional at the top level only if an extension supplies prompts. Every resolved variant must have a non-empty prompt.
environmentsNoEnvironment axisOptional.
productsNoProduct axisOptional.
extensionsNoExtension listOptional. If present, only extension-derived variants are created.
environment_variablesNoEnvironment variable list{name, value} entries injected into every variant. Values can be literals or axp://secrets/<slug> references resolved from your org secret store.
secretsNoString listDeprecated alias for host environment-variable names. Prefer environment_variables.
filesNoFile entry listHost files staged into every variant before setup runs.
testsYesTests objectMust contain at least one application or introspection test.
limitsYesLimits objectRun caps.

Top-level environment_variables, secrets, and files apply to every variant. MCP servers and setup checks are setup-owned fields.

Variant axes

Variant axes are the experiment inputs AXP combines into runnable variants. This lets you define the dimensions you care about once, then compare results by agent, prompt, environment, and product.

variants = agents × prompts × environments × products

environments and products are optional. If an axis is absent, it is omitted from the variant coordinate.

Agents

agents defines which coding agents run the experiment. Each agent can use its default model or pin an explicit model and model controls.

Accepted shapes:

# String form: one agent, provider-default model.
agents: claude

# List of strings: multiple agents, provider-default models.
agents:
  - claude
  - codex

# Object form: one agent with an explicit model.
agents:
  name: claude
  model: anthropic/claude-sonnet-4.6

# List of objects: multiple explicit agent/model pairs.
agents:
  - name: claude
    model: anthropic/claude-sonnet-4.6

model is optional but recommended. If omitted, AXP uses the provider-default model for that agent.

Agent object

FieldRequiredType / valuesNotes
nameYesclaude, codex, or cursorCoding agent to run.
modelNoModel id string or model objectBare agent names use provider defaults.

Model object

A model object configures the model used by an agent. Optional controls are passed through when the selected agent supports them.

FieldRequiredType / valuesNotes
nameYesStringProvider/model id, such as anthropic/claude-opus-4.8 or openai/gpt-5.
effortNolow, medium, high, x-high, maxReasoning effort control.
context_window_sizeNoStringContext window hint, such as 1M or 200k.
thinkingNoBooleanEnables thinking mode when supported.
fastNoBooleanEnables fast mode when supported.
agents:
  - name: claude
    model:
      name: anthropic/claude-opus-4.8
      effort: high
      context_window_size: 1M
      thinking: true
  - name: codex
    model: openai/gpt-5

Prompts

prompts defines the tasks agents try to complete. Each resolved variant receives one final prompt.

Accepted shapes:

# String form: one prompt.
prompts: "Build the report."

# List of strings: multiple prompts.
prompts:
  - "Build the report."
  - "Build the report and explain your steps."

# Object form: list of named prompts.
prompts:
  - id: detailed
    prompt: "Build the report and explain your steps."

Prompt object

A prompt object gives a task stable metadata, so Results can filter and group by the prompt that produced each run.

FieldRequiredType / valuesNotes
idYesKebab-case stringStable prompt id. Recorded for filtering in Results.
promptYesStringTask text given to the agent.
descriptionNoStringHuman-readable notes.
tagsNoString listFree-form labels.

Bare prompt strings get positional ids: p0, p1, and so on.

Environments

environments defines the sandbox conditions around the agent, such as installed tools, fixtures, or external integrations. Use environments when you want to compare how agents perform under different surrounding conditions.

Accepted shapes:

# String form: one setup script with a generated environment name.
environments: "pip install -r requirements.txt"

# Object form: one named environment.
environments:
  name: workspace
  setup: "pip install -r requirements.txt"

# List form: multiple named environments.
environments:
  - name: workspace
    setup: "pip install -r requirements.txt"

Environment object

An environment object names one sandbox setup condition. Its setup prepares the sandbox before the agent receives the prompt.

FieldRequiredType / valuesNotes
nameYesKebab-case stringEnvironment coordinate. Recorded for filtering in Results.
setupYesSetupPrepares the sandbox before the agent runs.
descriptionNoStringHuman-readable notes.
tagsNoString listFree-form labels.
commitNoStringSource commit of the environment under test.

Bare environment strings are shorthand for an environment whose setup is that string.

Products

products defines the agent-facing surface under test. A product can be a CLI, API, MCP server, SDK, docs surface, or other tool the agent uses to complete the prompt.

Accepted shapes:

# String form: one setup script with a generated product name.
products: "npm install -g my-cli"

# Object form: one named product.
products:
  name: my-cli
  type: CLI
  version: "1.2.0"
  setup: "npm install -g my-cli"

# List form: multiple named products.
products:
  - name: my-cli
    type: CLI
    version: "1.2.0"
    setup: "npm install -g my-cli"

Product object

A product object names what you want to compare. Product metadata is recorded so Results can filter and group by product and product version.

FieldRequiredType / valuesNotes
nameYesKebab-case stringProduct coordinate. Recorded for filtering in Results.
typeNoProduct typeDefaults to Other.
setupYesSetupPrepares the product before the agent runs.
versionNoStringProduct version. Quote numeric versions, such as "25.3".
commitNoStringSource commit of the product under test.
descriptionNoStringHuman-readable notes.
tagsNoString listFree-form labels.

Product type

Allowed values: CLI, MCP, API, Skill, SDK, Schema, Docs, Marketing, Agents.md, Other.

Setup

setup prepares the sandbox for an environment or product. Use setup scripts to install tools, create fixtures, start services, or expose resources the agent needs.

Each variant has its own isolated /workspace. A setup script cannot rely on files or side effects created by another variant.

Accepted shapes:

# String form: one setup script.
setup: "npm install"

# List of strings: multiple setup scripts, run in order.
setup:
  - "npm install"
  - "npm test -- --help"

# Object form: one named setup with optional scoped resources.
setup:
  name: install-cli
  script: "npm install"

# List of objects: multiple named setups, run in order.
setup:
  - name: install-cli
    script: "npm install"
  - name: smoke-cli
    script: "npm test -- --help"

# Mixed list: strings and setup objects can be combined.
setup:
  - "npm install"
  - name: smoke-cli
    script: "npm test -- --help"

Setup object

A setup object is the named form of setup. Use it when setup needs its own files, environment variables, MCP servers, or setup checks.

FieldRequiredType / valuesNotes
nameYesKebab-case stringRequired for object form.
scriptYesStringBash run before setup checks and before the agent. Secret values are not injected here.
descriptionNoStringHuman-readable notes.
tagsNoString listFree-form labels.
filesNoFile entry listAccepted on setup objects, but current runs only stage top-level files. Use top-level files when the run needs host files delivered.
environment_variablesNoEnvironment variable listRuntime env vars scoped to variants that use this setup.
secretsNoString listDeprecated.
mcp_serversNoMCP server listMCP servers exposed to the agent.
setup_checksNoSetup check listChecks run after setup and before the agent.

When both a product and environment contribute setup, product setup runs before environment setup.

Setup checks

setup_checks verify that setup produced a usable sandbox before the agent starts. A failed setup check stops that variant before any agent work happens.

setup_checks:
  - name: cli-on-path
    script: my-cli --version

Setup check object

FieldRequiredType / valuesNotes
nameYesKebab-case stringAppears in setup-checks/<name>.json.
scriptYesStringBash check. A non-zero exit aborts the variant before the agent runs.

MCP servers

mcp_servers exposes MCP servers to the agent for variants that use this setup. Use this field when the product surface or environment includes an MCP tool.

mcp_servers:
  - name: fixture-sentinel
    type: stdio
    command: /workspace/fixture-mcp.py
    args: []
  - name: axp
    type: http
    url: http://localhost:3001/mcp

MCP server object

An MCP server object defines one server and its transport. The required connection fields depend on type.

FieldRequiredType / valuesNotes
nameYesStringMust be unique within the setup's mcp_servers.
typeYesstdio, http, or sseTransport.
commandFor stdioStringExecutable path or command inside the sandbox.
argsNoString listStdio command arguments.
urlFor http / sseStringMCP endpoint URL.
envNoMCP stdio env entriesOnly valid for stdio.
headersNoMCP header entriesOnly valid for http / sse.

Transport-specific rules:

  • stdio uses command, optional args, and optional env.
  • http and sse use url and optional headers.
  • Mixing stdio-only and endpoint-only fields is rejected.

MCP stdio env entry

env forwards declared secret values to a stdio MCP process. Each entry is either a bare secret name or an object:

env:
  - GITHUB_TOKEN
  - name: GH_AUTH
    from: GITHUB_TOKEN
FieldRequiredType / valuesNotes
nameYesStringEnv var name as seen by the MCP server process.
fromYesSecret nameDeclared secret name whose value is forwarded.

MCP HTTP and SSE header object

headers attaches HTTP headers to an http or sse MCP server. Use placeholders when a header needs a declared secret value.

headers:
  - name: Authorization
    value: "Bearer ${SUPABASE_SERVICE_ROLE_KEY}"
FieldRequiredType / valuesNotes
nameYesStringHTTP header name. Header names must be unique per server, case-insensitively.
valueYesStringMay contain ${SECRET_NAME} placeholders. Bare $NAME is literal.

MCP env entries and header placeholders reference environment variable names visible to the variant. Values can come from literal environment_variables, axp://secrets/<slug> references, or the deprecated secrets list.

Rules enforced at axp experiment validate time:

  • Every stdio env[*].from and every ${NAME} placeholder in a header value must reference a name visible to the variant.
  • Bare $NAME is treated as a literal; only ${NAME} is a placeholder.
  • HTTP header names are unique per server, case-insensitively.
  • command / args / env are only valid for type: stdio; url / headers are only valid for type: http / sse.

Resolved secret values are written into the agent session frame and can appear in run artifacts. Treat artifacts as sensitive whenever an experiment forwards secrets to MCP servers.

Extensions

extensions refine the variant set when the base axes do not express the combinations you need. Use extensions to narrow an axis, swap products or environments for a subtree, or append prompt guidance to a slice of variants.

If an experiment declares any extensions, AXP creates only extension-derived variants.

Extension object

An extension object is one node in the refinement tree. Nested extensions are recursively cross-multiplied with their parents.

FieldRequiredType / valuesNotes
idYesKebab-case stringMust be unique among sibling extensions.
descriptionNoStringHuman-readable notes.
tagsNoString listAdded to resolved variant tags.
agentsNoAgent listReplaces inherited agents for this extension subtree.
promptsNoPrompt listAppended to inherited prompt text. Does not replace the prompt axis.
environmentsNoEnvironment listReplaces inherited environments for this extension subtree.
productsNoProduct listReplaces inherited products for this extension subtree.
extensionsNoExtension listNested extensions.
prompts:
  - id: analyze
    prompt: "Read /workspace/task.md and write /workspace/report.json."

extensions:
  - id: with-cli
    products:
      - name: cli
        type: CLI
        setup: "curl -fsSL https://clickhouse.com/ | sh"
    prompts: ["Use the ClickHouse CLI for the analysis."]
  - id: without-cli
    products:
      - name: no-cli
        setup: "true"
    prompts: ["Do not use the ClickHouse CLI; use another local method."]

Environment variables

environment_variables injects environment variables into variants at runtime. Use it for non-secret runtime configuration and supported secret references.

environment_variables:
  - name: LOG_LEVEL
    value: debug
  - name: GITHUB_TOKEN
    value: axp://secrets/prod-gh
  - name: DATABASE_URL
    value: axp://secrets/staging-db

Environment variable object

FieldRequiredType / valuesNotes
nameYesEnv-var nameMust match ^[A-Z_][A-Z0-9_]*$. Harness-reserved names and prefixes are rejected.
valueYesStringLiteral value or axp://secrets/<slug> reference. Other values beginning with axp:// are rejected.

Store secret values once with axp secrets set <slug>, then reference them with axp://secrets/<slug>. Both axp run and axp local run resolve these references before injecting env vars into the sandbox.

A referenced slug that does not exist in your org fails at preflight.

Reserved names:

  • ANTHROPIC_API_KEY
  • ANTHROPIC_BASE_URL
  • OPENAI_API_KEY
  • OPENAI_BASE_URL
  • CURSOR_API_KEY
  • MODEL
  • MAX_TURNS
  • IS_SANDBOX
  • TRACEPARENT
  • any name beginning with AXP_, CLAUDE_CODE_, CODEX_, CURSOR_, or OTEL_

Files

Top-level files stages host files or directories into every variant's /workspace before setup runs.

setup.files is accepted in experiment YAML, but current runs only stage top-level files. Put host file staging at the top level when you need the files delivered during a run.

files:
  - name: my-cli
    source: ../build/mycli
    dest: tools/mycli
  - source: https://example.com/fixtures/data.bin
    sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
    dest: fixtures/data.bin

File entry object

FieldRequiredType / valuesNotes
nameSometimesKebab-case stringRequired when source is omitted. Used as the --file NAME=SOURCE handle.
sourceSometimesHost path or http(s) URLRequired when name is omitted. Relative paths resolve against the YAML file's directory.
sha256No64-char hex stringValid for file sources and URL downloads. Invalid for directory sources.
destYesWorkspace-relative pathAbsolute paths, .., .axp-bridge, and :: are rejected.

Notes:

  • Directory sources copy their contents under dest/.
  • File sources land as a single file at dest.
  • URL sources must be publicly fetchable and land as a single file at dest; unpack archives in setup if needed.
  • Sandboxes are Linux containers; macOS binaries staged from a laptop will not run there.
  • Directory walks honor .axpignore, not .gitignore.
  • --file NAME=SOURCE binds or overrides the source of a named entry.
  • --file SOURCE::DEST stages an ad-hoc entry into every variant.
  • Missing or unbound sources abort real runs at preflight. --resolve-variants renders MISSING / UNBOUND annotations without failing.
  • Staging failures roll the variant up as status=error / exit_reason=staging_failed; the rest of the run keeps going.
  • Platform and local runs both stage top-level files.
  • Treat experiment YAML like a script you run: sources are read with your permissions and may point anywhere on the host.

Tests

tests defines how AXP scores each run. Application tests check the output state; introspection tests check how the agent got there.

tests:
  application:
    - name: report-exists
      script: test -f /workspace/report.json
  introspection:
    - name: under-thirty-tool-calls
      script: '[ "$(jq ".tool_calls | length" "$AXP_TRACE_PATH")" -lt 30 ]'
FieldRequiredType / valuesNotes
applicationNoTest listChecks resulting application state, files, commands, or endpoints.
introspectionNoTest listChecks agent behavior through trace artifacts such as AXP_TRACE_PATH.

Test object

FieldRequiredType / valuesNotes
nameYesKebab-case stringMust be globally unique across application and introspection tests.
scriptYesStringBash script. Streamed over stdin and not shown to the agent.

Limits

limits sets the run caps for each variant execution.

Limits object

FieldRequiredType / valuesNotes
max_turnsYesInteger greater than 0Agent turn cap.
max_time_secondsYesInteger greater than 0Wall-clock timeout in seconds.
max_cost_usdYesNumber greater than 0Enforced when the agent reports cumulative cost during the run; if no cost is reported, AXP cannot stop on cost.

Secrets deprecated

secrets is a deprecated alias for host environment variable names injected into variants at runtime.

Prefer environment_variables for literal values. Some MCP secret-forwarding fields still reference declared secret names.

secrets:
  - GITHUB_TOKEN

Secret names must match ^[A-Z_][A-Z0-9_]*$.

Validation rules

An experiment is invalid if:

  • the YAML contains a field not defined by the schema
  • schema_version is not 2
  • any required field is missing
  • an id that must be kebab-case is not kebab-case
  • an agent model id contains ::
  • a declared axis is empty
  • duplicate ids or names appear where uniqueness is required
  • the resolved variant set is empty
  • a resolved variant has an empty prompt
  • two resolved variants collide on variant_id
  • no tests are defined
  • test names are duplicated
  • an env var or secret name is invalid or reserved
  • a file entry has an invalid name, source, sha256, or dest
  • an MCP server mixes transport-specific fields
  • an MCP server references a secret not visible to the variant
  • any limit is not greater than zero

YAML syntax boundaries

The experiment data model is JSON-compatible even though the authoring file is YAML.

  • YAML comments are allowed.
  • YAML anchors and aliases are allowed when the resolved value is JSON-compatible.
  • Custom YAML tags are unsupported.
  • Non-string mapping keys are unsupported.