Custom Agents¶
A custom agent is a specialized AI sub-agent defined in a .agent.md file (VS Code format) or .md file (Claude Code format). These files describe the agent's purpose, instructions, and optional tool restrictions using YAML frontmatter and a markdown prompt body.
pytest-skill-engineering supports custom agent files as a first-class concept. For end-to-end testing, use load_custom_agent() + CopilotEval — it tests real Copilot subagent dispatch exactly as users experience it. Use Eval.from_agent_file() for fast synthetic iteration on agent instructions without a Copilot subscription.
Custom Eval File Format¶
Custom agent files use YAML frontmatter for metadata and a markdown body for the agent's instructions:
---
name: reviewer
description: 'Code review specialist — identifies bugs and code quality issues'
tools:
- read_file
- list_directory
---
# Code Reviewer
You are a code review specialist. When asked to review code:
1. Read the relevant files using `read_file`
2. Check for bugs, security issues, and code quality problems
3. Provide actionable feedback with specific line references
Focus on correctness first, then maintainability.
File Locations¶
| Format | Location | Description |
|---|---|---|
| VS Code | .github/agents/*.agent.md |
VS Code Copilot custom agents |
| Claude Code | .claude/agents/*.md |
Claude Code custom agents |
Frontmatter Fields¶
| Field | Type | Description |
|---|---|---|
name |
string | Eval display name (optional — derived from filename if absent) |
description |
string | Short description of the agent's purpose |
tools |
list | Tool names this agent is restricted to (optional) |
Any additional frontmatter fields (e.g. maturity, handoffs) are preserved in metadata and can be accessed programmatically.
Using with Eval.from_agent_file() (synthetic testing)¶
Eval.from_agent_file() loads a custom agent file and uses the prompt body as the agent's custom instructions. This lets you test whether the agent's instructions produce the expected behaviour using any LLM provider — no Copilot subscription required.
DEPRECATED:
Eval.from_agent_file()is legacy synthetic testing. For new tests, useload_custom_agent()+CopilotEvalinstead — it tests real Copilot subagent dispatch exactly as users experience it.
import pytest
from pytest_skill_engineering.copilot import CopilotEval
from pytest_skill_engineering.core.evals import load_custom_agent
reviewer = load_custom_agent(".github/agents/reviewer.agent.md")
async def test_reviewer_reads_files(copilot_eval):
"""Reviewer should read files before giving feedback."""
agent = CopilotEval(
name="test-reviewer",
custom_agents=[reviewer],
instructions="Delegate code review to the reviewer agent.",
)
result = await copilot_eval(agent, "Review the authentication module in src/auth.py")
assert result.success
What from_agent_file() does (legacy)¶
Note: This section describes
Eval.from_agent_file()for backward compatibility. Useload_custom_agent()+CopilotEvalfor new tests.
- Sets the agent's custom instructions from the agent file's markdown body
- Sets
namefrom the filename (e.g.reviewer.agent.md→reviewer) - Maps
toolsfrontmatter field toallowed_tools(restricts which tools the agent can call) - Any kwarg you pass (e.g.
name=,max_turns=) overrides the file values
Using with load_custom_agent() + CopilotEval (real dispatch)¶
load_custom_agent() and load_custom_agents() load agent files into dicts compatible with CopilotEval.custom_agents. This tests real subagent dispatch — Copilot natively loads and routes tasks to your sub-agents, exactly as end users experience it.
from pytest_skill_engineering import load_custom_agent, load_custom_agents
from pytest_skill_engineering.copilot import CopilotEval
# Single agent
reviewer = load_custom_agent(".github/agents/reviewer.agent.md")
@pytest.mark.copilot
async def test_orchestrator_dispatches_to_reviewer(copilot_eval):
agent = CopilotEval(
name="orchestrator",
instructions="Delegate code reviews to the reviewer agent.",
custom_agents=[reviewer],
)
result = await copilot_eval(agent, "Review src/auth.py for security issues.")
assert result.success
# Check the sub-agent was invoked
assert any(s.eval_name == "reviewer" for s in result.subagent_invocations)
Load all agents from a directory¶
from pytest_skill_engineering import load_custom_agents
agents = load_custom_agents(
".github/agents/",
exclude={"orchestrator"}, # don't load the orchestrator as a sub-agent
)
@pytest.mark.copilot
async def test_orchestrator_with_all_subagents(copilot_eval):
agent = CopilotEval(
name="orchestrator",
instructions="Delegate tasks to the appropriate specialist.",
custom_agents=agents,
)
result = await copilot_eval(agent, "Create and review a calculator module.")
assert result.success
Asserting on subagent_invocations¶
async def test_correct_agent_is_chosen(copilot_eval):
agents = load_custom_agents(".github/agents/")
agent = CopilotEval(
name="orchestrator",
instructions="Use specialist agents for each task.",
custom_agents=agents,
)
result = await copilot_eval(agent, "Write unit tests for the billing module.")
invoked = [s.eval_name for s in result.subagent_invocations]
assert "test-writer" in invoked
assert "reviewer" not in invoked # reviewer shouldn't be invoked for test writing
A/B Testing: with and without the agent¶
Compare behaviour with and without a custom agent to verify its instructions add value:
from pytest_skill_engineering.core.evals import load_custom_agent
from pytest_skill_engineering.copilot import CopilotEval
reviewer = load_custom_agent(".github/agents/reviewer.agent.md")
@pytest.mark.copilot
async def test_reviewer_improves_feedback_quality(copilot_eval):
without = CopilotEval(
name="no-reviewer",
instructions="Review code when asked.",
)
with_reviewer = CopilotEval(
name="with-reviewer",
instructions="Delegate code review to the reviewer agent.",
custom_agents=[reviewer],
)
r_without = await copilot_eval(without, "Review src/auth.py for security issues.")
r_with = await copilot_eval(with_reviewer, "Review src/auth.py for security issues.")
# Specialist agent should produce more specific findings
assert r_with.success
assert len(r_with.final_response) > len(r_without.final_response)
Choosing the right approach¶
Eval.from_agent_file() (legacy) |
load_custom_agent() + CopilotEval |
|
|---|---|---|
| What runs the agent | PydanticAI synthetic loop | Real GitHub Copilot (CLI SDK) |
| Tests | Agent's instructions (system prompt) | Real subagent dispatch and routing |
| LLM | Any provider (Azure, OpenAI, Copilot…) | GitHub Copilot only |
| Speed | Fast (in-process) | Slower (~5–10s CLI startup) |
| Requires Copilot | No | Yes (gh auth login) |
| Best for | Legacy CI tests | End-to-end dispatch validation |
Rule of thumb: Use
load_custom_agent()+CopilotEvalfor end-to-end dispatch validation — it's the primary path and tests what users experience.Eval.from_agent_file()is legacy and not recommended for new tests.
See Choosing a Test Harness for a full comparison.
Prompt Files (Slash Commands)¶
Alongside custom agents, VS Code and Claude Code support prompt files — reusable prompts that users invoke as slash commands (e.g. /review, /explain). These are the user-invocation side of the bundle, as opposed to custom agents which are the agent-configuration side.
| File | Location | Invoked as |
|---|---|---|
review.prompt.md |
.github/prompts/ |
/review in Copilot Chat |
review.md |
.claude/commands/ |
/review in Claude Code |
Use load_prompt_file() to load the body of a prompt file and use it as a test input:
from pytest_skill_engineering.copilot import CopilotEval
from pytest_skill_engineering import load_prompt_file, load_prompt_files
agent = CopilotEval(
name="code-helper",
instructions="You are a code assistant.",
)
async def test_review_prompt(copilot_eval):
"""The /review slash command produces actionable feedback."""
prompt = load_prompt_file(".github/prompts/review.prompt.md")
result = await copilot_eval(agent, prompt["body"])
assert result.success
Test all prompt files at once¶
import pytest
from pytest_skill_engineering import load_prompt_files
PROMPTS = load_prompt_files(".github/prompts/")
@pytest.mark.parametrize("prompt", PROMPTS, ids=lambda p: p["name"])
async def test_prompt_files(copilot_eval, agent, prompt):
"""All slash commands produce a successful response."""
result = await copilot_eval(agent, prompt["body"])
assert result.success
Format¶
Prompt files follow the same pattern as custom agent files — optional YAML frontmatter, markdown body:
---
description: Review code for quality and security issues
mode: agent
---
Review the current file for:
- Security vulnerabilities
- Performance issues
- Code style problems
Provide specific line numbers and suggested fixes.
load_prompt_file() returns {"name", "body", "description", "metadata"}. The body is what the user's slash command sends to the agent.
VS Code vs Claude Code: VS Code files use
.prompt.mdextension in.github/prompts/. Claude Code files use plain.mdin.claude/commands/.load_prompt_files()handles both —.prompt.mdfiles take precedence if both exist with the same name.
A/B Testing Eval Instructions¶
Iterating on a custom agent file? Test multiple versions side-by-side and let the leaderboard pick the winner.
Store each version as a separate file and parametrize over them:
.github/agents/
├── reviewer-v1.agent.md # "Review code for any issues"
└── reviewer-v2.agent.md # Focused checklist: security → correctness → style
import pytest
from pathlib import Path
from pytest_skill_engineering.copilot import CopilotEval
from pytest_skill_engineering.core.evals import load_custom_agent
AGENT_VERSIONS = {
path.stem: path
for path in Path(".github/agents").glob("reviewer-*.agent.md")
}
@pytest.mark.parametrize("name,path", AGENT_VERSIONS.items())
async def test_reviewer_finds_security_issue(copilot_eval, name, path):
reviewer = load_custom_agent(path)
agent = CopilotEval(
name=name,
custom_agents=[reviewer],
instructions="Delegate code review tasks to the reviewer agent.",
)
result = await copilot_eval(agent, "Review src/auth.py for security vulnerabilities")
assert result.success
The AI analysis report auto-detects that the agent instructions vary and shows a leaderboard ranking each version by pass rate and cost.
Tip: This works exactly the same for skills — swap
load_custom_agent()forskill_directories=[...]inCopilotEvaland parametrize over skill versions.
Next Steps¶
- Comparing Configurations — A/B test agent variants systematically
- Test Coding Agents — Full Copilot agent testing guide
- Eval Skills — Add domain knowledge to agents