Agents¶

The core concept in pytest-aitest.

What is an Agent?¶

An Agent is a test configuration that bundles everything needed to run a test:

Agent = Model + System Prompt + Skill + Server(s)

from pytest_aitest import Agent, Provider, MCPServer, Skill

banking_server = MCPServer(command=["python", "banking_mcp.py"])

agent = Agent(
    provider=Provider(model="azure/gpt-5-mini"),
    mcp_servers=[banking_server],
    system_prompt="Be concise.",
    skill=Skill.from_path("skills/financial-advisor"),  # Optional
)

The Agent is NOT What You Test¶

You don't test agents. You USE agents to test:

Target	Question
MCP Server	Can the LLM understand and use my tools?
System Prompt	Do these instructions produce the behavior I want?
Agent Skill	Does this domain knowledge improve performance?

The Agent is the test harness that bundles an LLM with the configuration you want to evaluate.

Agent Components¶

Component	Required	Example
Provider	✓	`Provider(model="azure/gpt-5-mini")`
MCP Servers	Optional	`MCPServer(command=["python", "server.py"])`
System Prompt	Optional	`"Be concise and direct."`
Skill	Optional	`Skill.from_path("skills/financial-advisor")`

Agent Leaderboard¶

When you test multiple agents, the report shows an Agent Leaderboard.

This happens automatically - no configuration needed. Just parametrize your tests:

from pathlib import Path
import pytest
from pytest_aitest import Agent, Provider, MCPServer, load_system_prompts

banking_server = MCPServer(command=["python", "banking_mcp.py"])
PROMPTS = load_system_prompts(Path("prompts/"))

@pytest.mark.parametrize("prompt_name,system_prompt", PROMPTS.items())
async def test_banking(aitest_run, prompt_name, system_prompt):
    agent = Agent(
        provider=Provider(model="azure/gpt-5-mini"),
        mcp_servers=[banking_server],
        system_prompt=system_prompt,
    )
    result = await aitest_run(agent, "What's my checking balance?")
    assert result.success

The report shows:

Agent	Pass Rate	Cost
gpt-5-mini (concise)	100%	$0.002
gpt-5-mini (detailed)	100%	$0.004

Winning Criteria¶

Winning Agent = Highest pass rate → Lowest cost (tiebreaker)

Pass rate (primary) — 100% beats 95%, always
Cost (tiebreaker) — Among equal pass rates, cheaper wins

Dimension Detection¶

The AI analysis detects what varies between agents to provide targeted feedback:

What Varies	AI Feedback Focuses On
Model	Which model works best with your tools
System Prompt	Which instructions produce better behavior
Skill	Whether domain knowledge helps
Server	Which implementation is more reliable

This is for AI analysis only - the leaderboard always appears when multiple agents are tested.

Examples¶

Compare Models¶

MODELS = ["azure/gpt-5-mini", "azure/gpt-4.1"]
banking_server = MCPServer(command=["python", "banking_mcp.py"])

@pytest.mark.parametrize("model", MODELS)
async def test_with_model(aitest_run, model):
    agent = Agent(
        provider=Provider(model=model),
        mcp_servers=[banking_server],
    )
    result = await aitest_run(agent, "What's my checking balance?")
    assert result.success

Compare System Prompts¶

from pathlib import Path
from pytest_aitest import load_system_prompts

PROMPTS = load_system_prompts(Path("prompts/"))
banking_server = MCPServer(command=["python", "banking_mcp.py"])

@pytest.mark.parametrize("prompt_name,system_prompt", PROMPTS.items())
async def test_with_prompt(aitest_run, prompt_name, system_prompt):
    agent = Agent(
        provider=Provider(model="azure/gpt-5-mini"),
        mcp_servers=[banking_server],
        system_prompt=system_prompt,
    )
    result = await aitest_run(agent, "What's my checking balance?")
    assert result.success

Compare Multiple Dimensions¶

MODELS = ["gpt-5-mini", "gpt-4.1"]
PROMPTS = load_system_prompts(Path("prompts/"))
banking_server = MCPServer(command=["python", "banking_mcp.py"])

@pytest.mark.parametrize("model", MODELS)
@pytest.mark.parametrize("prompt_name,system_prompt", PROMPTS.items())
async def test_combinations(aitest_run, model, prompt_name, system_prompt):
    agent = Agent(
        provider=Provider(model=f"azure/{model}"),
        mcp_servers=[banking_server],
        system_prompt=system_prompt,
    )
    result = await aitest_run(agent, "What's my checking balance?")
    assert result.success

Next Steps¶

Comparing Configurations — More comparison patterns
A/B Testing Servers — Test server versions
AI Analysis — What the AI evaluation produces