Tests¶
Agent Studio test cases are simulated conversations that run your agent end-to-end in the sandbox environment. They are managed locally as YAML files under test_suite/ and pushed to Agent Studio with poly push.
Each test case describes a scenario for a simulated user and a set of assertions to evaluate against the resulting conversation. Then, the tests run inside Agent Studio against the pushed branch.
Where tests fit in the workflow¶
Tests sit between validation and merge in the standard CLI working pattern. Edit locally, validate with poly validate, push, then run the suite from Agent Studio or chat against the branch with poly chat.
-
Validation
Use
poly validateto check project configuration before pushing. -
Simulated conversations
Define test cases under
test_suite/and run them in Agent Studio. -
Interactive review
Use
poly chatand Agent Studio to spot-check behavior on the pushed branch.
Location¶
Test cases are defined as one YAML file per test under:
The directory is optional. Create it only when you have test cases to define.
Filename must match the test name
The filename (without .yaml) must match the normalized form of the name field: lowercased, with punctuation replaced by underscores. Greeting flow test becomes greeting_flow_test.yaml. poly push rejects mismatched names.
What a test case contains¶
| Field | Required | Description |
|---|---|---|
name |
Yes | Human-readable test name. Must match the filename when normalized. |
scenario |
Yes | Natural-language description of what the simulated user does. Drives the simulator turn-by-turn. |
channel |
Yes | voice or webchat. |
language |
Yes | BCP 47 language tag (e.g. en-GB). Must be a configured language in the project. |
variant |
No | Name of a variant from config/variant_attributes.yaml. Defaults to the project default variant. |
tags |
No | List of strings used to group, filter, or schedule tests in Agent Studio. |
prompt_assertions |
No | List of natural-language statements that must hold about the agent's behavior. Each is evaluated by an LLM judge. |
function_call_assertions |
No | List of expected function calls and their argument values. |
At least one of prompt_assertions or function_call_assertions should be set — a test with no assertions runs but cannot pass or fail.
Prompt assertions¶
Each prompt assertion is a free-text statement evaluated against the full conversation by an LLM judge. Write them as observable behaviors, not internal reasoning.
prompt_assertions:
- The agent confirms the caller's booking reference before continuing
- The agent does not ask for the caller's date of birth
Function call assertions¶
Each function call assertion checks that a global function was called and, optionally, with specific argument values.
| Field | Description |
|---|---|
name |
Global function name. Must match a function in functions/. |
arguments |
List of argument assertions. May be empty to check only that the function was called. |
Argument assertion fields:
| Field | Description |
|---|---|
parameter_name |
Parameter as defined on the function. |
expected_value |
Expected value, expressed as a string. |
value_type |
One of string, integer, number, boolean. |
function_call_assertions:
- name: lookup_booking
arguments:
- parameter_name: booking_reference
expected_value: "ABC123"
value_type: string
- parameter_name: party_size
expected_value: "4"
value_type: integer
Only function name and argument values are asserted. The function does not have to be the only call in the conversation, and the order of calls is not checked.
Example¶
name: Greeting flow test
scenario: Ask for help with booking.
channel: voice
language: en-GB
tags:
- booking
- smoke
prompt_assertions:
- The agent offers to help with booking
function_call_assertions:
- name: lookup_booking
arguments:
- parameter_name: booking_reference
expected_value: "ABC123"
value_type: string
A minimal webchat smoke test with only a prompt assertion:
name: Webchat smoke test
scenario: Say hello on webchat.
channel: webchat
language: en-GB
tags:
- smoke
prompt_assertions:
- The agent greets the user
Validation¶
poly validate checks each test case:
channelmust bevoiceorwebchatscenariois required and non-emptylanguageis required and must be one of the project's configured languages (default_languageoradditional_languages)variant, if set, must reference a variant declared inconfig/variant_attributes.yaml- each
function_call_assertions[*].namemust match a global function underfunctions/ - each argument's
value_typemust be one ofstring,integer,number,boolean - the filename must match the normalized
name
Validation runs automatically as part of poly push.
Push and run¶
Test cases follow the standard ADK lifecycle:
- edit YAML files under
test_suite/locally - validate with
poly validate - push with
poly pushto sync to Agent Studio - run the suite from Agent Studio against the pushed branch
poly push creates, updates, or deletes test cases on Agent Studio to match local state, including prompt_assertions and tags. There is no poly test command — execution happens in Agent Studio.
Tests are branch-scoped
Tests are pushed to the current branch and run against that branch's agent. Use a branch per scenario when iterating on flows or topics so test results map cleanly to the change under review.
What to cover¶
Good coverage of a project usually includes:
- the happy path of every flow and major topic
- key error paths — missing booking, invalid input, unavailable slot
- function call shape — confirm the agent calls the right function with the right arguments for each branch of logic
- state transitions across turns — confirm later turns reference earlier user input
- behavior on the channels your project actually ships on (voice, webchat, or both)
Best practices¶
- write
scenarioas a short, concrete user goal — "Ask to cancel a booking with reference ABC123" — not a script - prefer prompt assertions for behavior, function call assertions for integration correctness
- keep each test case focused on one outcome; split combined scenarios into multiple files
- use
tagsconsistently (smoke,regression,<flow_name>) so suites can be filtered in Agent Studio - cover error paths, not only success cases
- add a webchat and a voice variant of any critical path that runs on both channels
- validate as part of the normal edit loop, not just before merge
- combine the suite with
poly chatand interactive review in Agent Studio when behavior depends on the full conversation flow
Related pages¶
-
CLI reference
poly validate,poly push, andpoly chat— the commands used in the test workflow. Open CLI reference -
Functions
Reference for the global functions named in function call assertions. Open functions
-
Variants
Define the variants referenced by the
variantfield. Open variants -
Languages
Configure the languages a test case can target. Open languages
-
Working locally
How tests fit into the daily edit / validate / push loop. Open working locally