AI Analysis¶
Every pytest-aitest report includes AI-generated analysis. An LLM reads your test results and produces actionable feedback — not just metrics.
Why It's Mandatory¶
Traditional test reports tell you what failed:
For AI tool testing, this is useless. A test might fail because your tool description is ambiguous, your parameter name is confusing, or your system prompt contradicts itself. Metrics can't diagnose these problems.
pytest-aitest requires an AI model to generate reports:
Without --aitest-summary-model, report generation will error. See How to Generate Reports for full configuration options.
What the AI Produces¶
The analysis model receives your complete test results — tool calls, responses, timing, costs — and produces structured markdown covering these areas:
| Section | What It Tells You |
|---|---|
| 🎯 Recommendation | Which agent to deploy and why |
| ❌ Failure Analysis | Root cause + fix for each failed test |
| 🔧 MCP Tool Feedback | Specific tool description improvements |
| 📝 System Prompt Feedback | Instruction conflicts and rewrites |
| 📚 Skill Feedback | Domain knowledge gaps |
| � Optimizations | Ways to reduce turns, tokens, and cost |
| 📦 Tool Response Optimization | Reduce token waste in tool return values |
Not every section appears in every report — the AI only produces sections relevant to your test data.
Quality Rules¶
The analysis prompt enforces strict rules for consistent, useful output:
- No speculation — Only analyze what's in the test results
- No generic advice — Every suggestion references specific test data
- Exact rewrites — Don't say "make it clearer", provide the exact new text
- Cite test IDs — Reference specific tests when discussing failures
- Concise — 3 good insights beat 10 vague ones
Cost¶
The summary model analyzes your test results, which are relatively small:
| Tests | Approx. Input Tokens | Cost (gpt-5.2-chat) |
|---|---|---|
| 10 | ~2,000 | $0.01 |
| 50 | ~8,000 | $0.04 |
| 200 | ~30,000 | $0.15 |
Sample Reports¶
See these example reports to understand what pytest-aitest generates:
| Report | Scenario | What It Shows |
|---|---|---|
| Single Agent | One agent, multiple tests | Basic report structure, AI analysis |
| Multi-Agent Comparison | Two agents compared | Agent leaderboard, side-by-side results |
| Sessions | Multi-turn conversations | Session grouping, context flow |
| Agent Selector | 3+ agents | Agent selector UI, pick any 2 to compare |