Test Harness¶
pytest-skill-engineering uses the GitHub Copilot coding agent as its test harness.
CopilotEval — The Only Harness¶
CopilotEval runs the actual GitHub Copilot coding agent via the github-copilot-sdk. This tests what your users experience when they use your MCP server, skill, or custom agent in GitHub Copilot.
from pytest_skill_engineering.copilot import CopilotEval
async def test_skill_integration(copilot_eval):
agent = CopilotEval(
name="banking-test",
skill_directories=["skills/banking-advisor"],
max_turns=10,
)
result = await copilot_eval(agent, "What's my checking balance?")
assert result.success
assert result.tool_was_called("get_balance")
Why Test with Real Copilot?¶
Traditional unit tests can't catch skill stack failures:
- Tool discovery — Can Copilot find the right tool?
- Parameter inference — Does it pass correct arguments?
- Skill integration — Does domain knowledge improve performance?
- Subagent dispatch — Do custom
.agent.mdfiles trigger correctly? - Multi-turn context — Does conversation state persist?
CopilotEval validates all of this using the exact same agent your users have.
Features¶
| Feature | CopilotEval |
|---|---|
| What runs | Real GitHub Copilot coding agent (CLI SDK) |
| Model | Copilot's active model (or override with model=) |
| Model setup | None — gh auth login is all you need |
| MCP auth | Handled automatically by Copilot CLI |
| Skill loading | Native (SKILL.md, Agent Skills spec) |
| Custom agents | Native dispatch via custom_agents=[] |
| Cost tracking | Premium requests (Copilot billing) |
| Speed | ~5-10s CLI startup per test |
| Install | uv add pytest-skill-engineering |
| Auth | gh auth login (Copilot subscription required) |
When to Use CopilotEval¶
Always. It's the only harness — designed to test the real user experience.
Best for:
- End-to-end validation — Test what actually ships to users
- Zero setup — No API keys, Azure deployments, or provider config
- MCP OAuth — Copilot CLI handles authentication automatically
- Native skills — Agent Skills loaded exactly as users experience them
- Subagent testing — Validate custom
.agent.mdagent selection - Regression testing — Catch skill stack regressions before publishing
Quick Start¶
# Install
uv add pytest-skill-engineering
# Authenticate (one-time)
gh auth login
# Write test
from pytest_skill_engineering.copilot import CopilotEval
async def test_balance(copilot_eval):
agent = CopilotEval(skill_directories=["skills/banking"])
result = await copilot_eval(agent, "What's my checking balance?")
assert result.success
Requirements¶
- Python 3.11+
- pytest 9.0+
- GitHub Copilot subscription (
gh auth login)
Next Steps¶
- Test MCP Servers — Test tool discovery and usage
- Test Coding Agents — Full CopilotEval reference
- Agent Skills — Add domain knowledge
- Custom Agents — Test
.agent.mdfiles