How to Generate Reports¶

Generate HTML, JSON, and Markdown reports with AI-powered insights.

Quick Start (Recommended)¶

Configure once in pyproject.toml:

[tool.pytest.ini_options]
addopts = """
--aitest-summary-model=azure/gpt-5.2-chat
--aitest-html=aitest-reports/report.html
"""

Then just run:

pytest tests/

Reports are generated automatically with AI insights. This approach is recommended because:

Version controlled — Team shares the same configuration
Less typing — No need to remember CLI flags
Consistent — Every run produces reports the same way

What Gets Generated¶

Output	When	AI Model Required?
JSON	Always (every test run)	No — raw test data, no AI analysis
HTML report	When `--aitest-html` is set	Yes — `--aitest-summary-model` required
Markdown report	When `--aitest-md` is set	Yes — `--aitest-summary-model` required

JSON results are always saved to aitest-reports/results.json (or a custom path via --aitest-json). This raw data can be used later to regenerate HTML/MD reports without re-running tests.

Important

--aitest-summary-model is required for HTML and Markdown reports. Without it, report generation will error. JSON output works without a summary model.

CLI Options (Alternative)¶

You can also use CLI flags directly:

# Run tests with AI-powered HTML report
pytest tests/ \
    --aitest-summary-model=azure/gpt-5.2-chat \
    --aitest-html=report.html

# Run tests without reports (JSON is still auto-generated)
pytest tests/

Option	Description
`--aitest-html=PATH`	Generate HTML report (requires `--aitest-summary-model`)
`--aitest-md=PATH`	Generate Markdown report (requires `--aitest-summary-model`)
`--aitest-json=PATH`	Custom JSON path (default: `aitest-reports/results.json`)
`--aitest-summary-model=MODEL`	Model for AI insights (required for HTML/MD). Accepts `azure/`, `openai/`, `copilot/`, etc.
`--aitest-min-pass-rate=N`	Fail if pass rate below N% (e.g., `80`)

Report Regeneration¶

Regenerate reports from saved JSON without re-running tests:

# Regenerate HTML from saved JSON (reuses existing AI insights)
pytest-skill-engineering-report aitest-reports/results.json \
    --html report.html

# Generate Markdown report
pytest-skill-engineering-report aitest-reports/results.json \
    --md report.md

# Generate both HTML and Markdown
pytest-skill-engineering-report results.json \
    --html report.html \
    --md report.md

# Regenerate with fresh AI insights from a different model
pytest-skill-engineering-report results.json \
    --html report.html \
    --summary --summary-model azure/gpt-4.1

This is useful for:

Iterating on report styling without re-running expensive LLM tests
Generating different formats from one test run
Experimenting with different AI summary models

Eval Leaderboard¶

When you test multiple evals, the report shows an Eval Leaderboard ranking all configurations:

Eval	Pass Rate	Cost
✓ gpt-4.1 (detailed)	100%	$0.15
✓ gpt-5-mini (detailed)	97%	$0.03
✗ gpt-5-mini (concise)	82%	$0.02

Winning Eval = Highest pass rate → Lowest cost (tiebreaker)

Dimension Detection¶

The AI detects what varies between evals to focus its analysis:

What Varies	AI Analysis Focuses On
Model	Which model works best
Custom Agent	Which custom agent instructions work best
Skill	Whether domain knowledge helps
Server	Which implementation is more reliable

Winning = Highest pass rate → Lowest cost (tiebreaker)

Leaderboard Ranking¶

When comparing evals, rankings are based on:

Pass rate (primary) — higher is better
Total cost (tiebreaker) — lower is better

AI Insights¶

Reports include AI analysis with actionable recommendations. For a detailed explanation of each insight section, see AI Analysis.

Recommended Models¶

Use the most capable model you can afford for quality analysis:

Provider	Recommended Models
Azure OpenAI	`azure/gpt-5.2-chat` (best), `azure/gpt-4.1`
OpenAI	`openai/gpt-4.1`, `openai/gpt-4o`
Anthropic	`anthropic/claude-opus-4`, `anthropic/claude-sonnet-4`

Don't Use Cheap Models for Analysis

Smaller models (gpt-4o-mini, gpt-5-mini) produce generic, low-quality insights. The summary model analyzes your test results and generates actionable feedback. Use your most capable model here—this is a one-time cost per test run.

Report Structure¶

For details on the HTML report layout including header, leaderboard, and test details, see Report Structure.

Next Steps¶

CI/CD Integration — JUnit XML, GitHub Actions, Azure Pipelines