How to Test MCP Servers¶

Test your Model Context Protocol (MCP) servers by running evals against them.

How It Works¶

pytest-skill-engineering uses the official MCP Python SDK to connect to MCP servers:

Connects to the server via stdio, SSE, or Streamable HTTP transport
Discovers tools via MCP protocol
Routes tool calls from the LLM to the server
Returns results back to the LLM

Transports¶

pytest-skill-engineering supports all three MCP transports.

stdio (default)¶

Launches a local subprocess and communicates via stdin/stdout:

import pytest
from pytest_skill_engineering import MCPServer, Wait

@pytest.fixture(scope="module")
def banking_server():
    return MCPServer(
        command=["python", "-m", "my_banking_mcp"],
        wait=Wait.for_tools(["get_balance"]),
    )

SSE¶

Connects to a remote server using Server-Sent Events:

@pytest.fixture(scope="module")
def remote_server():
    return MCPServer(
        transport="sse",
        url="http://localhost:8000/sse",
    )

Streamable HTTP¶

Connects to a remote server using the Streamable HTTP transport (recommended for production):

@pytest.fixture(scope="module")
def remote_server():
    return MCPServer(
        transport="streamable-http",
        url="http://localhost:8000/mcp",
    )

Authentication Headers¶

Pass headers for authenticated endpoints. Headers support ${VAR} expansion:

@pytest.fixture(scope="module")
def authenticated_server():
    return MCPServer(
        transport="streamable-http",
        url="https://mcp.example.com/mcp",
        headers={"Authorization": "Bearer ${MCP_API_TOKEN}"},
    )

Configuration Options¶

# stdio transport
MCPServer(
    command=["python", "-m", "server"],  # Command to start server
    args=["--debug"],                     # Additional arguments
    env={"API_KEY": "xxx"},               # Environment variables
    cwd="/path/to/server",                # Working directory
    wait=Wait.for_tools(["tool1"]),       # Wait condition
)

# Remote transport (SSE or streamable-http)
MCPServer(
    transport="streamable-http",          # "sse" or "streamable-http"
    url="http://localhost:8000/mcp",      # Server URL
    headers={"Authorization": "Bearer ${TOKEN}"},  # Optional headers
    wait=Wait.for_tools(["tool1"]),       # Wait condition
)

Option	Transport	Description	Default
`transport`	All	`"stdio"`, `"sse"`, or `"streamable-http"`	`"stdio"`
`command`	stdio	Command to start the MCP server	Required for stdio
`args`	stdio	Additional command-line arguments	`[]`
`url`	sse, streamable-http	Server endpoint URL	Required for remote
`headers`	sse, streamable-http	HTTP headers (supports `${VAR}` expansion)	`{}`
`env`	stdio	Environment variables (supports `${VAR}` expansion)	`{}`
`cwd`	stdio	Working directory	Current directory
`wait`	All	Wait condition for server startup	`Wait.ready()`

Wait Strategies¶

Control how pytest-skill-engineering waits for the server to be ready.

Wait.ready() — Wait briefly for the process to start (default):

wait=Wait.ready()

Wait.for_tools() — Wait until specific tools are available (recommended):

wait=Wait.for_tools(["get_balance", "set_reminder"])

Wait.for_log() — Wait for a specific log pattern (regex):

wait=Wait.for_log(r"Server started on port \d+")

All wait strategies accept a timeout:

wait=Wait.for_tools(["tool1"], timeout_ms=60000)  # 60 seconds

NPX-based Servers¶

@pytest.fixture(scope="module")
def filesystem_server():
    return MCPServer(
        command=["npx", "-y", "@modelcontextprotocol/server-filesystem"],
        args=["/tmp/workspace"],
        wait=Wait.for_tools(["read_file", "write_file"]),
    )

Environment Variables¶

import os

@pytest.fixture(scope="module")
def api_server():
    return MCPServer(
        command=["python", "-m", "my_api_server"],
        env={
            "API_BASE_URL": "https://api.example.com",
            "API_KEY": os.environ["MY_API_KEY"],
        },
    )

Complete Example¶

import pytest
from pytest_skill_engineering import MCPServer, Wait
from pytest_skill_engineering.copilot import CopilotEval

@pytest.fixture(scope="module")
def banking_server():
    return MCPServer(
        command=["python", "-m", "my_banking_mcp"],
        wait=Wait.for_tools(["get_balance", "transfer"]),
    )

@pytest.fixture
def banking_agent():
    return CopilotEval(
        name="banking",
        instructions="You are a banking assistant.",
    )

async def test_balance_query(copilot_eval, banking_agent):
    result = await copilot_eval(banking_agent, "What's my checking balance?")

    assert result.success
    assert result.tool_was_called("get_balance")

Multiple Servers¶

Combine multiple MCP servers in a single eval:

@pytest.fixture(scope="module")
def banking_server():
    return MCPServer(
        command=["python", "-m", "banking_mcp"],
        wait=Wait.for_tools(["get_balance"]),
    )

@pytest.fixture(scope="module")
def calendar_server():
    return MCPServer(
        command=["python", "-m", "calendar_mcp"],
        wait=Wait.for_tools(["create_event", "list_events"]),
    )

@pytest.fixture
def assistant_agent():
    return CopilotEval(
        name="assistant",
        instructions="You can check balances and manage calendar.",
    )

Filtering Tools¶

Use allowed_tools on the Eval to limit which tools are exposed to the LLM. This reduces token usage and focuses the eval.

@pytest.fixture
def balance_agent():
    # banking_server has 16 tools, but this test only needs 2
    return CopilotEval(
        name="balance-checker",
        instructions="You check account balances.",
        allowed_tools=["get_balance", "get_all_balances"],
    )

MCP Server Prompts¶

MCP servers can bundle prompt templates alongside their tools — reusable message templates that surface in VS Code as slash commands (e.g. /mcp.servername.code_review). pytest-skill-engineering can discover and test these.

Use MCPServerProcess directly to interact with the MCP protocol:

import pytest
from pytest_skill_engineering import MCPPrompt, MCPServer
from pytest_skill_engineering.copilot import CopilotEval
from pytest_skill_engineering.execution.servers import MCPServerProcess

@pytest.fixture(scope="module")
async def server_process(banking_server):
    """Start the server and expose the raw MCP session."""
    proc = MCPServerProcess(banking_server)
    await proc.start()
    yield proc
    await proc.stop()

async def test_prompts_are_discoverable(server_process):
    """The server exposes the expected prompt templates."""
    prompts = await server_process.list_prompts()
    names = [p.name for p in prompts]
    assert "balance_summary" in names

async def test_balance_summary_prompt(copilot_eval, server_process):
    """The balance_summary prompt produces a coherent LLM response."""
    # Render the template (like VS Code does when user invokes the slash command)
    messages = await server_process.get_prompt(
        "balance_summary",
        {"account_type": "checking"},
    )
    assert messages, "Prompt returned no messages"

    # Run the rendered prompt through the LLM
    agent = CopilotEval(
        name="balance-summary",
        instructions="You are a banking assistant.",
    )
    result = await copilot_eval(agent, messages[0]["content"])
    assert result.success

What `get_prompt` Returns¶

get_prompt returns a list of {"role": str, "content": str} dicts — the assembled messages the MCP server produces for that template. Use messages[0]["content"] as the test prompt, or assert on the rendered content directly:

messages = await server_process.get_prompt("code_review", {"code": "def hello(): ..."})
# Structural assertion: prompt was rendered
assert len(messages) > 0
assert "hello" in messages[0]["content"]  # Template filled argument in

Troubleshooting¶

Server Doesn't Start¶

Check that the command works standalone:

python -m my_server

Tools Not Discovered¶

Use Wait.for_tools() and check server logs:

MCPServer(
    command=["python", "-m", "server"],
    wait=Wait.for_tools(["expected_tool"]),
)

Timeout During Startup¶

Increase the timeout:

MCPServer(
    command=["python", "-m", "slow_server"],
    wait=Wait.for_tools(["tool"], timeout_ms=120000),  # 2 minutes
)