pytest-codingagents¶

Combatting cargo cult programming in Agent Instructions, Skills, and Custom Agents for GitHub Copilot and other coding agents since 2026.

Everyone's copying instruction files from blog posts, pasting "you are a senior engineer" into agent configs, and adding skills they found on Reddit. But does any of it actually work? Are your instructions making your coding agent better — or just longer? Is that skill helping, or is the agent ignoring it entirely?

You don't know, because you're not testing it.

pytest-codingagents is a pytest plugin that runs your actual coding agent configuration against real tasks — then uses AI analysis to tell you why things failed and what to fix.

Currently supports GitHub Copilot via copilot-sdk. More agents (Claude Code, etc.) coming soon.

from pytest_codingagents import CopilotAgent

async def test_create_file(copilot_run, tmp_path):
    agent = CopilotAgent(
        instructions="Create files as requested.",
        working_directory=str(tmp_path),
    )
    result = await copilot_run(agent, "Create hello.py with print('hello')")
    assert result.success
    assert result.tool_was_called("create_file")

Install¶

uv add pytest-codingagents

Authenticate via GITHUB_TOKEN env var (CI) or gh auth status (local).

What You Can Test¶

Capability	What it proves	Guide
Instructions	Your custom instructions actually produce the desired behavior — not just vibes	Getting Started
Skills	That domain knowledge file is helping, not being ignored	Skill Testing
Models	Which model works best for your use case and budget	Model Comparison
Custom Agents	Your custom agent configurations actually work as intended	Getting Started
MCP Servers	The agent discovers and uses your custom tools	MCP Server Testing
CLI Tools	The agent operates command-line interfaces correctly	CLI Tool Testing
Tool Control	Allowlists and blocklists restrict tool usage	Tool Control

AI Analysis¶

See it in action: Basic Report · Model Comparison · Instruction Testing

Every test run produces an HTML report with AI-powered insights:

Diagnoses failures — root cause analysis with suggested fixes
Compares models — leaderboards ranked by pass rate and cost
Evaluates instructions — which instructions produce better results
Recommends improvements — actionable changes to tools, instructions, and skills

uv run pytest tests/ --aitest-html=report.html --aitest-summary-model=azure/gpt-5.2-chat

Documentation¶

Full docs at sbroenne.github.io/pytest-codingagents — API reference, how-to guides, and demo reports.

Getting Started — Install and write your first test
How-To Guides — Skills, MCP servers, CLI tools, and more
Demo Reports — See real HTML reports with AI analysis
API Reference — Full API documentation