Custom Agents¶

A custom agent is a specialized AI sub-agent defined in a .agent.md file (VS Code format) or .md file (Claude Code format). These files describe the agent's purpose, instructions, and optional tool restrictions using YAML frontmatter and a markdown prompt body.

pytest-skill-engineering supports custom agent files as a first-class concept. For end-to-end testing, use load_custom_agent() + CopilotEval — it tests real Copilot subagent dispatch exactly as users experience it. Use Eval.from_agent_file() for fast synthetic iteration on agent instructions without a Copilot subscription.

Custom Eval File Format¶

Custom agent files use YAML frontmatter for metadata and a markdown body for the agent's instructions:

.github/agents/reviewer.agent.md

---
name: reviewer
description: 'Code review specialist — identifies bugs and code quality issues'
tools:
  - read_file
  - list_directory
---

# Code Reviewer

You are a code review specialist. When asked to review code:

1. Read the relevant files using `read_file`
2. Check for bugs, security issues, and code quality problems
3. Provide actionable feedback with specific line references

Focus on correctness first, then maintainability.

File Locations¶

Format	Location	Description
VS Code	`.github/agents/*.agent.md`	VS Code Copilot custom agents
Claude Code	`.claude/agents/*.md`	Claude Code custom agents

Frontmatter Fields¶

Field	Type	Description
`name`	string	Eval display name (optional — derived from filename if absent)
`description`	string	Short description of the agent's purpose
`tools`	list	Tool names this agent is restricted to (optional)

Any additional frontmatter fields (e.g. maturity, handoffs) are preserved in metadata and can be accessed programmatically.

Using with `Eval.from_agent_file()` (synthetic testing)¶

Eval.from_agent_file() loads a custom agent file and uses the prompt body as the agent's custom instructions. This lets you test whether the agent's instructions produce the expected behaviour using any LLM provider — no Copilot subscription required.

DEPRECATED: Eval.from_agent_file() is legacy synthetic testing. For new tests, use load_custom_agent() + CopilotEval instead — it tests real Copilot subagent dispatch exactly as users experience it.

import pytest
from pytest_skill_engineering.copilot import CopilotEval
from pytest_skill_engineering.core.evals import load_custom_agent

reviewer = load_custom_agent(".github/agents/reviewer.agent.md")

async def test_reviewer_reads_files(copilot_eval):
    """Reviewer should read files before giving feedback."""
    agent = CopilotEval(
        name="test-reviewer",
        custom_agents=[reviewer],
        instructions="Delegate code review to the reviewer agent.",
    )
    result = await copilot_eval(agent, "Review the authentication module in src/auth.py")
    assert result.success

What `from_agent_file()` does (legacy)¶

Note: This section describes Eval.from_agent_file() for backward compatibility. Use load_custom_agent() + CopilotEval for new tests.

Sets the agent's custom instructions from the agent file's markdown body
Sets name from the filename (e.g. reviewer.agent.md → reviewer)
Maps tools frontmatter field to allowed_tools (restricts which tools the agent can call)
Any kwarg you pass (e.g. name=, max_turns=) overrides the file values

Using with `load_custom_agent()` + `CopilotEval` (real dispatch)¶

load_custom_agent() and load_custom_agents() load agent files into dicts compatible with CopilotEval.custom_agents. This tests real subagent dispatch — Copilot natively loads and routes tasks to your sub-agents, exactly as end users experience it.

from pytest_skill_engineering import load_custom_agent, load_custom_agents
from pytest_skill_engineering.copilot import CopilotEval

# Single agent
reviewer = load_custom_agent(".github/agents/reviewer.agent.md")

@pytest.mark.copilot
async def test_orchestrator_dispatches_to_reviewer(copilot_eval):
    agent = CopilotEval(
        name="orchestrator",
        instructions="Delegate code reviews to the reviewer agent.",
        custom_agents=[reviewer],
    )
    result = await copilot_eval(agent, "Review src/auth.py for security issues.")
    assert result.success
    # Check the sub-agent was invoked
    assert any(s.eval_name == "reviewer" for s in result.subagent_invocations)

Load all agents from a directory¶

from pytest_skill_engineering import load_custom_agents

agents = load_custom_agents(
    ".github/agents/",
    exclude={"orchestrator"},  # don't load the orchestrator as a sub-agent
)

@pytest.mark.copilot
async def test_orchestrator_with_all_subagents(copilot_eval):
    agent = CopilotEval(
        name="orchestrator",
        instructions="Delegate tasks to the appropriate specialist.",
        custom_agents=agents,
    )
    result = await copilot_eval(agent, "Create and review a calculator module.")
    assert result.success

Asserting on `subagent_invocations`¶

async def test_correct_agent_is_chosen(copilot_eval):
    agents = load_custom_agents(".github/agents/")

    agent = CopilotEval(
        name="orchestrator",
        instructions="Use specialist agents for each task.",
        custom_agents=agents,
    )
    result = await copilot_eval(agent, "Write unit tests for the billing module.")

    invoked = [s.eval_name for s in result.subagent_invocations]
    assert "test-writer" in invoked
    assert "reviewer" not in invoked  # reviewer shouldn't be invoked for test writing

A/B Testing: with and without the agent¶

Compare behaviour with and without a custom agent to verify its instructions add value:

from pytest_skill_engineering.core.evals import load_custom_agent
from pytest_skill_engineering.copilot import CopilotEval

reviewer = load_custom_agent(".github/agents/reviewer.agent.md")

@pytest.mark.copilot
async def test_reviewer_improves_feedback_quality(copilot_eval):
    without = CopilotEval(
        name="no-reviewer",
        instructions="Review code when asked.",
    )
    with_reviewer = CopilotEval(
        name="with-reviewer",
        instructions="Delegate code review to the reviewer agent.",
        custom_agents=[reviewer],
    )

    r_without = await copilot_eval(without, "Review src/auth.py for security issues.")
    r_with    = await copilot_eval(with_reviewer, "Review src/auth.py for security issues.")

    # Specialist agent should produce more specific findings
    assert r_with.success
    assert len(r_with.final_response) > len(r_without.final_response)

Choosing the right approach¶

	`Eval.from_agent_file()` (legacy)	`load_custom_agent()` + `CopilotEval`
What runs the agent	PydanticAI synthetic loop	Real GitHub Copilot (CLI SDK)
Tests	Agent's instructions (system prompt)	Real subagent dispatch and routing
LLM	Any provider (Azure, OpenAI, Copilot…)	GitHub Copilot only
Speed	Fast (in-process)	Slower (~5–10s CLI startup)
Requires Copilot	No	Yes (`gh auth login`)
Best for	Legacy CI tests	End-to-end dispatch validation

Rule of thumb: Use load_custom_agent() + CopilotEval for end-to-end dispatch validation — it's the primary path and tests what users experience. Eval.from_agent_file() is legacy and not recommended for new tests.

See Choosing a Test Harness for a full comparison.

Prompt Files (Slash Commands)¶

Alongside custom agents, VS Code and Claude Code support prompt files — reusable prompts that users invoke as slash commands (e.g. /review, /explain). These are the user-invocation side of the bundle, as opposed to custom agents which are the agent-configuration side.

File	Location	Invoked as
`review.prompt.md`	`.github/prompts/`	`/review` in Copilot Chat
`review.md`	`.claude/commands/`	`/review` in Claude Code

Use load_prompt_file() to load the body of a prompt file and use it as a test input:

from pytest_skill_engineering.copilot import CopilotEval
from pytest_skill_engineering import load_prompt_file, load_prompt_files

agent = CopilotEval(
    name="code-helper",
    instructions="You are a code assistant.",
)

async def test_review_prompt(copilot_eval):
    """The /review slash command produces actionable feedback."""
    prompt = load_prompt_file(".github/prompts/review.prompt.md")
    result = await copilot_eval(agent, prompt["body"])
    assert result.success

Test all prompt files at once¶

import pytest
from pytest_skill_engineering import load_prompt_files

PROMPTS = load_prompt_files(".github/prompts/")

@pytest.mark.parametrize("prompt", PROMPTS, ids=lambda p: p["name"])
async def test_prompt_files(copilot_eval, agent, prompt):
    """All slash commands produce a successful response."""
    result = await copilot_eval(agent, prompt["body"])
    assert result.success

Format¶

Prompt files follow the same pattern as custom agent files — optional YAML frontmatter, markdown body:

.github/prompts/review.prompt.md

---
description: Review code for quality and security issues
mode: agent
---

Review the current file for:
- Security vulnerabilities
- Performance issues
- Code style problems

Provide specific line numbers and suggested fixes.

load_prompt_file() returns {"name", "body", "description", "metadata"}. The body is what the user's slash command sends to the agent.

VS Code vs Claude Code: VS Code files use .prompt.md extension in .github/prompts/. Claude Code files use plain .md in .claude/commands/. load_prompt_files() handles both — .prompt.md files take precedence if both exist with the same name.

A/B Testing Eval Instructions¶

Iterating on a custom agent file? Test multiple versions side-by-side and let the leaderboard pick the winner.

Store each version as a separate file and parametrize over them:

.github/agents/
├── reviewer-v1.agent.md   # "Review code for any issues"
└── reviewer-v2.agent.md   # Focused checklist: security → correctness → style

import pytest
from pathlib import Path
from pytest_skill_engineering.copilot import CopilotEval
from pytest_skill_engineering.core.evals import load_custom_agent

AGENT_VERSIONS = {
    path.stem: path
    for path in Path(".github/agents").glob("reviewer-*.agent.md")
}

@pytest.mark.parametrize("name,path", AGENT_VERSIONS.items())
async def test_reviewer_finds_security_issue(copilot_eval, name, path):
    reviewer = load_custom_agent(path)
    agent = CopilotEval(
        name=name,
        custom_agents=[reviewer],
        instructions="Delegate code review tasks to the reviewer agent.",
    )
    result = await copilot_eval(agent, "Review src/auth.py for security vulnerabilities")
    assert result.success

The AI analysis report auto-detects that the agent instructions vary and shows a leaderboard ranking each version by pass rate and cost.

Tip: This works exactly the same for skills — swap load_custom_agent() for skill_directories=[...] in CopilotEval and parametrize over skill versions.

Next Steps¶

Comparing Configurations — A/B test agent variants systematically
Test Coding Agents — Full Copilot agent testing guide
Eval Skills — Add domain knowledge to agents