Report Structure¶
The visual structure and components of pytest-aitest HTML reports.
Design Philosophy¶
Reports answer one question: "Which configuration should I deploy?"
Every visual element supports this goal through:
- Progressive disclosure — Summary first, details on demand
- Comparison-first — Winner highlighting, sorted rankings
- Scalability — Works for 2 agents or 20 agents
- Actionable insights — Not just metrics, but what to fix
Implementation¶
Reports are generated using htpy - a type-safe HTML generation library. Components are Python functions in src/pytest_aitest/reporting/components/.
Report Sections¶
┌─────────────────────────────────────────────────────────────────┐
│ 1. HEADER │
│ Suite name, status badge, metrics │
├─────────────────────────────────────────────────────────────────┤
│ 2. AI ANALYSIS │
│ LLM-generated markdown (insights.markdown_summary) │
├─────────────────────────────────────────────────────────────────┤
│ 3. AGENT LEADERBOARD (if > 1 agent) │
│ Ranked table of configurations │
├─────────────────────────────────────────────────────────────────┤
│ 4. AGENT SELECTOR (if > 2 agents) │
│ Pick 2 agents for side-by-side comparison │
├─────────────────────────────────────────────────────────────────┤
│ 5. TEST RESULTS │
│ Filter buttons + test cards with comparison columns │
├─────────────────────────────────────────────────────────────────┤
│ 6. OVERLAY (hidden by default) │
│ Fullscreen mermaid diagram viewer │
└─────────────────────────────────────────────────────────────────┘
1. Header¶
Suite identity and key metrics at the top of the report.
Component: report.py → _report_header()
Components¶
| Component | Content | Example |
|---|---|---|
| Suite Title | Module docstring or "Test Report" | "Banking API Integration Tests" |
| Status Badge | Pass/fail with visual styling | ✅ All Passed or ✗ 2 Failed |
| Metrics Bar | Key numbers | tests, duration, cost, AI analysis cost |
Layout¶
┌─────────────────────────────────────────────────────────────────┐
│ Banking API Integration Tests ✅ All Passed │
├─────────────────────────────────────────────────────────────────┤
│ 4 tests │ 12.3s │ $0.004 │ 🤖 $0.002 │
└─────────────────────────────────────────────────────────────────┘
2. AI Analysis¶
LLM-generated markdown rendered directly. The AI writes analysis prose that's displayed as-is.
Component: report.py → _ai_insights_section()
The insights.markdown_summary field contains the complete analysis as markdown, converted to HTML via the markdown library.
Features: - Toggle button — Collapse/expand the section - Markdown styling — Headers, lists, code blocks, etc.
For details on what the AI analyzes and how insights are generated, see AI Analysis.
3. Agent Leaderboard¶
Only shown when multiple agents are tested.
Component: agent_leaderboard.py → agent_leaderboard()
Answers: "Which configuration should I deploy?"
Layout¶
┌─────────────────────────────────────────────────────────────────┐
│ 🏆 Agent Leaderboard │
├─────────────────────────────────────────────────────────────────┤
│ Rank │ Agent │ Pass │ Tokens │ Cost │
├──────┼────────────────────────────────┼──────┼────────┼────────┤
│ 🥇 │ gpt-4.1-mini / concise │ 100% │ 561 ★ │ $0.001 │
│ 🥈 │ gpt-5-mini / concise │ 100% │ 743 │ $0.001 │
│ 🥉 │ gpt-4.1-mini / detailed │ 100% │ 764 │ $0.001 │
│ 4 │ gpt-5-mini / detailed │ 100% │ 973 │ $0.002 │
└──────┴────────────────────────────────┴──────┴────────┴────────┘
★ = Best in column Sorted by: Pass Rate → Cost (tiebreaker)
Features¶
- Medals (🥇🥈🥉) for top 3
- Pass rate bar (visual progress)
- Star (★) on best-in-column values
- Winner row highlighting (green background)
- Full agent identity: Model + Prompt name + Skill name
4. Agent Selector¶
Only shown when more than 2 agents are tested.
Component: agent_selector.py → agent_selector()
Allows picking exactly 2 agents for side-by-side comparison in test details.
Layout¶
┌─────────────────────────────────────────────────────────────────┐
│ Compare agents: │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ ☑ gpt-4.1-mini │ │ ☑ gpt-5-mini │ │ ☐ gpt-5-mini │ │
│ │ 100% ✓ │ │ 100% ✓ │ │ + skill │ │
│ └────────────────┘ └────────────────┘ └────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Behavior¶
- Exactly 2 selected — Always maintains 2 agents selected
- Click to swap — Clicking a third agent replaces the oldest selection
- Cannot deselect below 2 — Clicking selected agent does nothing
- Visual feedback — Selected chips have highlighted border
5. Test Results¶
All test results with comparison columns for selected agents.
Components:
- test_grid.py → test_grid() (main container)
- test_comparison.py → test_comparison() (per-test details)
Filter Buttons¶
┌─────────────────────────────────────────────────────────────────┐
│ [All (4)] [Failed (0)] │
└─────────────────────────────────────────────────────────────────┘
Test Card (Collapsed)¶
┌─────────────────────────────────────────────────────────────────┐
│ ▶ Check account balance ✅ passed │ 4.6s │
└─────────────────────────────────────────────────────────────────┘
Test Card (Expanded)¶
Shows side-by-side comparison of selected agents:
┌─────────────────────────────────────────────────────────────────┐
│ ▼ Check account balance ✅ passed │ 4.6s │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────────────────┐ ┌─────────────────────────┐ │
│ │ gpt-4.1-mini ✅│ │ gpt-5-mini ✅│ │
│ │ 561 tokens │ $0.001 │ │ 743 tokens │ $0.002 │ │
│ ├─────────────────────────┤ ├─────────────────────────┤ │
│ │ [Mermaid Diagram] │ │ [Mermaid Diagram] │ │
│ ├─────────────────────────┤ ├─────────────────────────┤ │
│ │ Final Response: │ │ Final Response: │ │
│ │ Balance: $1,500.00... │ │ Your checking balance...│ │
│ └─────────────────────────┘ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Session Grouping¶
Multi-turn sessions appear as grouped test cards with visual connectors:
┌─────────────────────────────────────────────────────────────────┐
│ 🔗 Session: banking-flow 3 tests │ all ✅ │
├─────────────────────────────────────────────────────────────────┤
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ ▼ Check account balance ✅ │ 2.1s │ │
│ │ [comparison columns...] │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ Context carried │
│ │ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ ▼ Transfer to savings ✅ │ 3.4s │ │
│ │ [comparison columns...] │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
6. Overlay¶
Fullscreen mermaid diagram viewer. Hidden by default, triggered by clicking a diagram.
Component: overlay.py → overlay()
Features¶
- Click diagram to enlarge — Opens in fullscreen overlay
- Click outside to close — Dismiss by clicking backdrop
- Re-renders at full size — Diagram redrawn for maximum readability
Adaptive Behavior¶
The report layout adapts based on what was tested:
| Scenario | Leaderboard | Agent Selector | Comparison Columns |
|---|---|---|---|
| 1 agent | ❌ | ❌ | ❌ (single column) |
| 2 agents | ✅ | ❌ | ✅ (both shown) |
| 3+ agents | ✅ | ✅ | ✅ (pick 2) |
| Sessions | Based on agent count | Based on agent count | ✅ |
Detection Logic¶
if len(agents) == 1:
# Simple mode: no comparison UI
show_leaderboard = False
show_selector = False
elif len(agents) == 2:
# Two-agent mode: comparison but no selector needed
show_leaderboard = True
show_selector = False
else:
# Multi-agent mode: full comparison UI
show_leaderboard = True
show_selector = True
Scalability Requirements¶
The design MUST work at these scales:
| Scale | Behavior |
|---|---|
| 2 agents | Leaderboard with 2 rows, no selector |
| 3-6 agents | Selector chips in single row |
| 8+ agents | Selector chips wrap to multiple rows |
| 20+ agents | Leaderboard with pagination |
| 50+ tests | All tests rendered, browser scroll |
Anti-Patterns (What NOT to Do)¶
❌ Don't show side-by-side cards that shrink with more items
❌ Don't truncate agent names — wrap or tooltip instead
❌ Don't show tiny unreadable diagrams
❌ Don't require horizontal scrolling for core content
❌ Don't select more than 2 agents for comparison
Visual Design Tokens¶
Consistent styling from Material Design (indigo theme):
| Token | Value | Usage |
|---|---|---|
| Primary | #4051b5 |
Primary actions, highlights |
| Pass | #22c55e |
Success states |
| Fail | #ef4444 |
Error states |
| Card BG | #282c34 |
Card backgrounds |
| Surface | #1e2129 |
Page background |
| Border radius | 4px |
Consistent Material feel |
| Font | Roboto |
Body text |
| Mono font | Roboto Mono |
Code, metrics |
Implementation Files¶
Components are Python functions generating HTML via htpy:
| File | Purpose |
|---|---|
components/report.py |
Main report, header, AI analysis |
components/agent_leaderboard.py |
Ranked agent table |
components/agent_selector.py |
Agent comparison picker |
components/test_grid.py |
Test list with filter buttons |
components/test_comparison.py |
Side-by-side agent results |
components/overlay.py |
Fullscreen diagram viewer |
components/types.py |
Data types for components |
templates/partials/tailwind.css |
All CSS styles |
templates/partials/scripts.js |
Client-side interactions |
Key Principles¶
- Exactly 2 for comparison — Always compare exactly 2 agents, no more
- AI explains, components display — AI writes insights in markdown
- Sessions are grouping, not special — Same test cards, visual connectors
- Progressive disclosure — Click to expand details
- No redundancy — Each piece of information appears once
Testing Matrix¶
Visual tests use stable JSON fixtures in tests/fixtures/reports/:
| Fixture | Agents | Sessions | What to Test |
|---|---|---|---|
01_single_agent.json |
1 | No | Header, AI Analysis, Test grid (no comparison) |
02_multi_agent.json |
2 | No | Leaderboard, Comparison columns (no selector) |
03_multi_agent_sessions.json |
2 | Yes | Session grouping, Leaderboard (no selector) |
04_agent_selector.json |
3 | No | Agent selector, Leaderboard with medals, Selection behavior |
Test Checklist by Fixture¶
01_single_agent.json:
- [ ] Header shows suite name and status badge
- [ ] AI Analysis section renders markdown
- [ ] AI Analysis toggle button works
- [ ] Test cards expand/collapse
- [ ] Mermaid diagrams render
- [ ] Filter buttons work (all/failed)
- [ ] NO leaderboard shown
- [ ] NO agent selector shown
- [ ] NO comparison columns (single column only)
02_multi_agent.json:
- [ ] Leaderboard shows 2 agents
- [ ] Winner row highlighted
- [ ] Both comparison columns visible
- [ ] NO agent selector (only 2 agents)
- [ ] Mermaid overlay opens on click
- [ ] Overlay closes on backdrop click
03_multi_agent_sessions.json:
- [ ] Session grouping with visual connectors
- [ ] Session header shows test count and status
- [ ] Leaderboard shows 2 agents
- [ ] NO agent selector (only 2 agents)
- [ ] Both comparison columns visible
04_agent_selector.json:
- [ ] Leaderboard shows 3 agents with medals (🥇🥈🥉)
- [ ] Winner row highlighted
- [ ] Agent selector shows 3 chips
- [ ] Exactly 2 agents selected by default
- [ ] Clicking 3rd agent swaps selection
- [ ] Cannot deselect to less than 2
- [ ] Comparison columns show side-by-side
- [ ] Hidden columns update when selection changes