Skill Testing¶

Test whether domain knowledge injected via skill directories improves agent behavior.

What Are Skills?¶

Skills are directories containing markdown files with domain-specific knowledge — coding standards, API conventions, architectural guidelines, etc. When attached to a CopilotAgent, these files are loaded into the agent's context.

Basic Usage¶

Create a skill directory and test that the agent follows its guidance:

from pytest_codingagents import CopilotAgent


async def test_coding_standards(copilot_run, tmp_path):
    # Create a skill with coding standards
    skill_dir = tmp_path / "skills"
    skill_dir.mkdir()
    (skill_dir / "coding-standards.md").write_text(
        "# Coding Standards\n\n"
        "All Python functions MUST have type hints and docstrings.\n"
        "Use snake_case for all function names.\n"
    )

    agent = CopilotAgent(
        name="with-standards",
        instructions="Follow all coding standards from your skills.",
        working_directory=str(tmp_path),
        skill_directories=[str(skill_dir)],
    )
    result = await copilot_run(agent, "Create math_utils.py with add and multiply")
    assert result.success
    content = (tmp_path / "math_utils.py").read_text()
    assert "def add" in content

Measuring Skill Impact¶

Parametrize tests with and without skills to measure whether they improve results:

import pytest
from pytest_codingagents import CopilotAgent


@pytest.mark.parametrize("use_skill", [True, False], ids=["with-skill", "without-skill"])
async def test_skill_impact(copilot_run, tmp_path, use_skill):
    # Set up skill directory
    skill_dir = tmp_path / "skills"
    skill_dir.mkdir()
    (skill_dir / "standards.md").write_text(
        "All functions MUST have type hints and docstrings."
    )

    skill_dirs = [str(skill_dir)] if use_skill else []
    agent = CopilotAgent(
        name=f"skill-{'on' if use_skill else 'off'}",
        instructions="Create clean Python code.",
        working_directory=str(tmp_path),
        skill_directories=skill_dirs,
    )
    result = await copilot_run(agent, "Create a utility module with 3 helper functions")
    assert result.success

The AI analysis report highlights differences in behavior and quality between skill-on and skill-off runs.

Multiple Skill Directories¶

Load skills from several directories:

agent = CopilotAgent(
    instructions="Follow all project guidelines.",
    working_directory=str(tmp_path),
    skill_directories=[
        "skills/coding-standards",
        "skills/api-conventions",
        "skills/testing-patterns",
    ],
)

Disabling Specific Skills¶

Selectively disable skills to isolate their effects:

agent = CopilotAgent(
    instructions="Follow project guidelines.",
    working_directory=str(tmp_path),
    skill_directories=["skills/"],
    disabled_skills=["deprecated-patterns"],
)

Comparing Skill Versions¶

Test different versions of the same skill to find the most effective phrasing:

import pytest
from pytest_codingagents import CopilotAgent

SKILL_VERSIONS = {
    "concise": "Use type hints. Use docstrings. Use snake_case.",
    "detailed": (
        "# Coding Standards\n\n"
        "## Type Hints\n"
        "Every function parameter and return value MUST have a type hint.\n\n"
        "## Docstrings\n"
        "Every public function MUST have a Google-style docstring.\n"
    ),
}


@pytest.mark.parametrize("style", SKILL_VERSIONS.keys())
async def test_skill_phrasing(copilot_run, tmp_path, style):
    skill_dir = tmp_path / "skills"
    skill_dir.mkdir()
    (skill_dir / "standards.md").write_text(SKILL_VERSIONS[style])

    agent = CopilotAgent(
        name=f"skill-{style}",
        instructions="Follow the coding standards.",
        working_directory=str(tmp_path),
        skill_directories=[str(skill_dir)],
    )
    result = await copilot_run(agent, "Create a data processing module")
    assert result.success