Skip to content

How to Test Plugins

Test GitHub Copilot CLI plugins and Claude Code projects as a whole. Plugin testing validates the full composition — instructions, skills, custom agents, and MCP servers working together — not just individual pieces.

Why Plugin Testing?

Individual tool tests verify that one MCP server works in isolation. But real plugins combine multiple components:

  • Instructions that set agent behavior
  • Skills that inject domain knowledge
  • Custom agents that handle specialized tasks
  • MCP servers that provide tools

A plugin test loads everything from a directory and validates the combined behavior. If the instructions conflict with the skill, or a custom agent can't find the right tool, plugin testing catches it.

Plugin Directory Layouts

pytest-skill-engineering auto-detects three directory formats:

Format Detection Typical Use
plugin.json manifest plugin.json exists at root VS Code Copilot extensions
.github/ project copilot-instructions.md exists in .github/ GitHub repos with Copilot customization
.claude/ project CLAUDE.md at root or .claude/ directory Claude Code workspaces

plugin.json (VS Code Extensions)

my-plugin/
├── plugin.json              # Manifest with name, instructions, MCP config
├── agents/
│   ├── reviewer.agent.md
│   └── writer.agent.md
└── skills/
    └── code-standards/
        └── SKILL.md

.github/ Project (GitHub Repos)

my-project/
├── .github/
│   ├── copilot-instructions.md   # System prompt
│   └── agents/
│       ├── reviewer.agent.md
│       └── writer.agent.md
├── .mcp.json                      # MCP server configurations
└── skills/
    └── domain-knowledge/
        └── SKILL.md

.claude/ Project (Claude Code)

my-project/
├── CLAUDE.md                # System prompt
├── .claude/
│   └── settings.json        # Claude-specific config
├── .mcp.json                # MCP server configurations
└── skills/
    └── domain-knowledge/
        └── SKILL.md

Loading Plugins

Use load_plugin() to inspect a plugin directory before testing:

from pytest_skill_engineering import load_plugin

plugin = load_plugin("path/to/my-plugin")

# Inspect what was discovered
print(plugin.metadata.name)      # Plugin name from manifest or directory
print(plugin.instructions)        # Merged instructions from all sources
print(plugin.agents)              # List of loaded .agent.md definitions
print(plugin.skills)              # Loaded Skill objects
print(plugin.mcp_configs)         # MCP server configurations from manifest

load_plugin() auto-detects the format based on directory contents and loads all components. Use this for programmatic inspection, or pass the path directly to CopilotEval.from_plugin().

Testing Plugins

Use CopilotEval.from_plugin() to test plugin behavior:

import pytest
from pytest_skill_engineering.copilot import CopilotEval

async def test_plugin_tool_usage(copilot_eval):
    agent = CopilotEval.from_plugin("path/to/my-plugin")
    result = await copilot_eval(agent, "What's my checking balance?")

    assert result.success
    assert result.tool_was_called("get_balance")

CopilotEval.from_plugin() loads instructions, skills, and custom agents from the directory. The Copilot SDK handles model selection and tool access.

Overriding Plugin Settings

# Override the model
agent = CopilotEval.from_plugin(
    "path/to/my-plugin",
    model="gpt-5.2-chat",
)

# Override instructions (ignores plugin instructions)
agent = CopilotEval.from_plugin(
    "path/to/my-plugin",
    instructions="Custom override instructions.",
)

Testing with Copilot SDK (CopilotEval)

Use CopilotEval.from_plugin() when testing with the real GitHub Copilot agent — including custom agent dispatch, skill loading, and file operations:

import pytest
from pytest_skill_engineering.copilot import CopilotEval

@pytest.mark.copilot
async def test_plugin_with_copilot(copilot_eval, tmp_path):
    agent = CopilotEval.from_plugin(
        "path/to/my-plugin",
        model="gpt-5-mini",
        working_directory=str(tmp_path),
    )
    result = await copilot_eval(agent, "Create a hello world script")

    assert result.success
    assert (tmp_path / "hello.py").exists()

Verifying Custom Agent Dispatch

When your plugin includes custom agents, verify that Copilot dispatches to the right one:

@pytest.mark.copilot
async def test_plugin_routes_to_reviewer(copilot_eval, tmp_path):
    agent = CopilotEval.from_plugin(
        "path/to/my-plugin",
        model="gpt-5-mini",
        working_directory=str(tmp_path),
    )
    result = await copilot_eval(agent, "Review this code for security issues")

    assert result.success
    invoked = [s.eval_name for s in result.subagent_invocations]
    assert "reviewer" in invoked

Claude Code Projects

Use CopilotEval.from_claude_config() to test Claude Code project directories. This loads CLAUDE.md as the system prompt and .claude/ settings:

from pytest_skill_engineering.copilot import CopilotEval

@pytest.mark.copilot
async def test_claude_project(copilot_eval, tmp_path):
    agent = CopilotEval.from_claude_config(
        "path/to/claude-project",
        model="gpt-5-mini",
        working_directory=str(tmp_path),
    )
    result = await copilot_eval(agent, "Explain this project")

    assert result.success

Claude Config ≠ Claude Model

from_claude_config() loads the project's configuration format (CLAUDE.md, .claude/ settings). The agent still runs on whatever model you specify — typically a GitHub Copilot model.

MCP Config Loading

Load MCP server configurations from .mcp.json files independently:

from pytest_skill_engineering.copilot import load_mcp_config

configs = load_mcp_config("path/to/.mcp.json")
# Returns dict of server name → config

for name, config in configs.items():
    print(f"Server: {name}")
    print(f"  Command: {config.get('command')}")
    print(f"  Args: {config.get('args', [])}")

This is useful when you need to inspect MCP configurations without loading the full plugin, or when building custom test fixtures from a project's server definitions.

Assertion Helpers

Basic Assertions

# Agent completed successfully
assert result.success

# Specific tool was called
assert result.tool_was_called("get_balance")

# Tool called at least N times
assert result.tool_call_count("transfer") >= 1

# Total tool calls
assert len(result.all_tool_calls) >= 2

Multi-Server Assertions

When plugins have multiple MCP servers, assert which server handled a tool call:

# Verify the tool came from the expected server
assert result.tool_was_called_from_server("get_balance", "banking-server")

# Useful when two servers expose similarly-named tools
assert result.tool_was_called_from_server("search", "document-server")
assert not result.tool_was_called_from_server("search", "web-server")

Semantic Assertions

Use llm_assert for AI-powered validation of response content:

async def test_plugin_response_quality(copilot_eval, llm_assert):
    agent = CopilotEval.from_plugin("path/to/my-plugin")
    result = await copilot_eval(agent, "Summarize my financial situation")

    assert result.success
    assert llm_assert(
        result.final_response,
        "provides a summary of account balances with actionable advice",
    )

SDK Passthroughs

CopilotEval supports additional SDK configuration for advanced scenarios:

from pytest_skill_engineering.copilot import CopilotEval

agent = CopilotEval(
    name="my-test",
    model="gpt-5-mini",
    active_agent="banking-advisor",  # Route to a specific custom agent
    hooks={"onSessionStart": "path/to/hook.py"},  # Lifecycle hooks
    working_directory=str(tmp_path),
)
Parameter Description
active_agent Route the session to a specific custom agent by name
hooks Lifecycle hook configuration (dict of event → handler path)
excluded_tools List of tool names to block from the agent
skill_directories Directories containing skills to load
reasoning_effort Reasoning effort level ("low", "medium", "high")
custom_agents List of custom agent definitions for subagent dispatch

Comparing Plugin Configurations

Parametrize tests to compare different plugin versions or configurations:

from pathlib import Path

PLUGIN_DIRS = [
    "plugins/v1-minimal",
    "plugins/v2-with-skills",
    "plugins/v3-with-agents",
]

@pytest.mark.parametrize("plugin_path", PLUGIN_DIRS, ids=lambda p: Path(p).name)
async def test_plugin_versions(copilot_eval, plugin_path):
    agent = CopilotEval.from_plugin(plugin_path)
    result = await copilot_eval(agent, "What's my checking balance?")

    assert result.success
    assert result.tool_was_called("get_balance")

The report generates an eval leaderboard comparing pass rates, costs, and AI-analyzed differences across plugin versions.

Supported Formats Reference

Format Detection Use With
plugin.json File exists at root CopilotEval.from_plugin()
.github/ project copilot-instructions.md exists CopilotEval.from_plugin()
.claude/ project CLAUDE.md or .claude/ dir exists CopilotEval.from_claude_config(), CopilotEval.from_plugin()

Next Steps