Comparing Configurations¶

The power of pytest-aitest is comparing different configurations to find what works best.

Pattern 1: Explicit Configurations¶

Define agents with meaningful names when testing distinct approaches:

from pytest_aitest import Agent, Provider, MCPServer, Skill

banking_server = MCPServer(command=["python", "banking_mcp.py"])

# Test different prompts with the same MCP server
agent_brief = Agent(
    name="brief-prompt",
    provider=Provider(model="azure/gpt-5-mini"),
    mcp_servers=[banking_server],
    system_prompt="Be concise. One sentence max.",
)

agent_detailed = Agent(
    name="detailed-prompt",
    provider=Provider(model="azure/gpt-5-mini"),
    mcp_servers=[banking_server],
    system_prompt="Be thorough. Explain your reasoning.",
)

agent_with_skill = Agent(
    name="with-skill",
    provider=Provider(model="azure/gpt-5-mini"),
    mcp_servers=[banking_server],
    skill=Skill.from_path("skills/financial-advisor"),
)

AGENTS = [agent_brief, agent_detailed, agent_with_skill]

@pytest.mark.parametrize("agent", AGENTS, ids=lambda a: a.name)
async def test_balance_query(aitest_run, agent):
    """Which configuration handles balance queries best?"""
    result = await aitest_run(agent, "What's my checking balance?")
    assert result.success

This runs 3 tests:

test_balance_query[brief-prompt]
test_balance_query[detailed-prompt]
test_balance_query[with-skill]

Use explicit configurations when:

Testing conceptually different approaches
Names have meaning ("with-skill", "without-skill")
You want full control over each configuration

Pattern 2: Generated Configurations¶

Generate configurations from all permutations for systematic testing:

MODELS = ["gpt-5-mini", "gpt-4.1"]
PROMPTS = {
    "brief": "Be concise.",
    "detailed": "Explain your reasoning step by step.",
}

banking_server = MCPServer(command=["python", "banking_mcp.py"])

# Generate all combinations
AGENTS = [
    Agent(
        name=f"{model}-{prompt_name}",
        provider=Provider(model=f"azure/{model}"),
        mcp_servers=[banking_server],
        system_prompt=prompt,
    )
    for model in MODELS
    for prompt_name, prompt in PROMPTS.items()
]

# 2 models × 2 prompts = 4 configurations
@pytest.mark.parametrize("agent", AGENTS, ids=lambda a: a.name)
async def test_balance_query(aitest_run, agent):
    """Test MCP server with different model/prompt combinations."""
    result = await aitest_run(agent, "What's my checking balance?")
    assert result.success

This runs 4 tests:

test_balance_query[gpt-5-mini-brief]
test_balance_query[gpt-5-mini-detailed]
test_balance_query[gpt-4.1-brief]
test_balance_query[gpt-4.1-detailed]

Use generated configurations when:

You want to test all combinations systematically
Looking for interactions (e.g., "this MCP server works with gpt-4.1 but fails with gpt-5-mini")
Comparing multiple dimensions at once

What the Report Shows¶

The report shows an Agent Leaderboard (auto-detected when multiple agents are tested):

Agent	Pass Rate	Tokens	Cost
gpt-5-mini-brief	100%	747	$0.002
gpt-4.1-brief	100%	560	$0.008
gpt-5-mini-detailed	100%	1,203	$0.004
gpt-4.1-detailed	100%	892	$0.012

Winning agent: Highest pass rate → lowest cost (tiebreaker).

This helps you answer:

"Which configuration works best for my MCP server?"
"Can I use a cheaper model with my tools?"
"Does this prompt improve tool usage?"

Next Steps¶

Multi-Turn Sessions — Test conversations with context
A/B Testing Servers — Compare server implementations

📁 Real Examples: - test_basic_usage.py — Single agent workflows - test_dimension_detection.py — Multi-dimension comparison