Getting Started¶
Write your first AI test in under 5 minutes using the real GitHub Copilot coding agent.
What You're Testing¶
pytest-skill-engineering tests whether GitHub Copilot can understand and use your tools:
- MCP Server Tools — Can Copilot discover and call your tools correctly?
- Agent Skills — Does domain knowledge improve performance?
- Custom Agents — Do your
.agent.mdinstructions produce the right behavior and trigger subagent dispatch? - MCP Server Prompts — Do your bundled prompt templates render and produce the right behavior?
- CLI Tools — Can Copilot effectively use command-line interfaces?
Installation¶
Authentication¶
Authenticate with GitHub Copilot (one-time):
This gives pytest-skill-engineering access to the real GitHub Copilot coding agent.
Your First Test¶
The simplest case: verify GitHub Copilot can use your MCP server correctly.
from pytest_skill_engineering.copilot import CopilotEval
async def test_balance_query(copilot_eval):
"""Verify Copilot can use get_balance correctly."""
agent = CopilotEval(
skill_directories=["skills/banking-advisor"], # Optional skill
max_turns=10,
)
result = await copilot_eval(agent, "What's my checking account balance?")
assert result.success
assert result.tool_was_called("get_balance")
What this tests:
- Tool discovery — Did Copilot find
get_balance? - Parameter inference — Did it pass
account="checking"correctly? - Response handling — Did it interpret the tool output?
- Skill integration — Did the banking skill improve performance?
If this fails, your MCP server's tool descriptions, schemas, or skill content need work.
The Workflow¶
This is test-driven skill engineering — iterate on your AI interface the same way you iterate on code:
- Write a test — describe what a user would say
- Run it — GitHub Copilot tries to use your tools
- Fix the interface — improve tool descriptions, skills, or agent instructions until it passes
- Generate a report — AI analysis tells you what else to optimize
Red/Green/Refactor for the skill stack.
Running the Test¶
AI-Powered Reports¶
Configure reporting in pyproject.toml:
[tool.pytest.ini_options]
addopts = """
--aitest-summary-model=copilot/gpt-5-mini
--aitest-html=aitest-reports/report.html
"""
Run pytest:
The report includes:
- Eval Leaderboard — Which configurations work best (pass rate + cost)
- AI Analysis — Deployment recommendation, failure root causes, tool description improvements
- Tool Feedback — Specific suggestions with copy-to-clipboard buttons
- Cost Tracking — Premium requests and USD estimates
Next Steps¶
- Custom Agents — Test
.agent.mdfiles and validate subagent dispatch - Agent Skills — Add domain knowledge (agentskills.io spec-compliant)
- Plugin Testing — Load complete plugin directories
- Test Coding Agents — Full
CopilotEvalreference