How to Test CLI Tools¶
Wrap command-line tools as MCP-like servers for testing CLI-based interfaces.
Basic Setup¶
from pytest_aitest import CLIServer
@pytest.fixture(scope="module")
def git_server():
return CLIServer(
name="git-cli",
command="git",
tool_prefix="git", # Creates "git_execute" tool
)
How It Works¶
The CLI server wraps any command-line tool and exposes it as a single tool that accepts arguments:
- Creates a tool:
{tool_prefix}_executethat accepts anargsparameter - LLM discovers usage: The LLM must run
--helpitself to discover CLI capabilities - Returns structured output: JSON with
exit_code,stdout,stderr
Why no auto-discovery?
By default, help discovery is disabled. This tests that your skill/prompt properly instructs the LLM to discover CLI capabilities itself. Set discover_help=True to pre-populate the tool description.
# The LLM calls the tool like this:
git_execute(args="status --porcelain")
git_execute(args="log -n 5 --oneline")
Configuration Options¶
CLIServer(
name="git-cli", # Server identifier (required)
command="git", # CLI executable (required)
tool_prefix="git", # Tool name prefix (default: command name)
shell="bash", # Shell to use (optional)
cwd="/path/to/repo", # Working directory (optional)
env={"KEY": "value"}, # Environment variables (optional)
discover_help=False, # Default: LLM must discover CLI usage itself
help_flag="--help", # Flag to get help text (default: --help)
description=None, # Custom description (overrides help discovery)
)
| Option | Description | Default |
|---|---|---|
name |
Server identifier | Required |
command |
CLI executable to wrap | Required |
tool_prefix |
Prefix for generated tool name | Command name |
shell |
Shell to run commands in | Auto-detect |
cwd |
Working directory | Current directory |
env |
Environment variables | {} |
discover_help |
Run help flag for tool description | False |
help_flag |
Flag to get help text (when discover_help=True) |
--help |
description |
Custom tool description | None |
Shell Selection¶
The shell is auto-detected based on platform:
| Platform | Default | Available |
|---|---|---|
| Linux/macOS | bash |
bash, sh, zsh |
| Windows | powershell |
powershell, pwsh, cmd |
# Explicit shell selection
CLIServer(
name="dir-cli",
command="dir",
tool_prefix="dir",
shell="cmd", # Use cmd.exe on Windows
)
Help Discovery¶
By default, help discovery is disabled (discover_help=False). This tests whether your skill or system prompt properly instructs the LLM to discover CLI capabilities itself by running --help.
# Default: LLM must discover help itself
CLIServer(
name="kubectl",
command="kubectl",
tool_prefix="k8s",
)
# Enable auto-discovery (pre-populates tool description with --help output)
CLIServer(
name="kubectl",
command="kubectl",
tool_prefix="k8s",
discover_help=True,
)
# Auto-discovery with custom help flag
CLIServer(
name="custom-cli",
command="my-tool",
tool_prefix="tool",
discover_help=True,
help_flag="-h",
)
Help text is truncated to 2000 characters to avoid token bloat.
Custom Description¶
When help discovery is disabled (the default), you can provide a custom description:
CLIServer(
name="legacy-cli",
command="legacy-tool",
tool_prefix="legacy",
description="""
Manages legacy data files.
Commands:
- list: List all records
- get <id>: Get a specific record
- delete <id>: Delete a record
- export <format>: Export data (json, csv)
""",
)
Tool Output Format¶
Tool results are JSON with structured output:
Complete Example: Testing Git¶
import pytest
from pytest_aitest import Agent, CLIServer, Provider
@pytest.fixture(scope="module")
def git_server():
return CLIServer(
name="git-cli",
command="git",
tool_prefix="git",
cwd="/path/to/repo",
)
@pytest.fixture
def git_agent(git_server):
return Agent(
name="git-assistant",
provider=Provider(model="azure/gpt-5-mini"),
cli_servers=[git_server],
system_prompt="You are a git assistant.",
max_turns=5,
)
async def test_git_status(aitest_run, git_agent):
result = await aitest_run(git_agent, "What's the repo status?")
assert result.success
assert result.tool_was_called("git_execute")
async def test_git_log(aitest_run, git_agent):
result = await aitest_run(git_agent, "Show me the last 3 commits")
assert result.success
assert result.tool_was_called("git_execute")
Combining MCP and CLI Servers¶
@pytest.fixture(scope="module")
def filesystem_server():
return MCPServer(
command=["npx", "-y", "@modelcontextprotocol/server-filesystem", "/tmp"],
wait=Wait.for_tools(["read_file", "write_file"]),
)
@pytest.fixture(scope="module")
def grep_server():
return CLIServer(
name="grep-cli",
command="grep",
tool_prefix="search",
)
@pytest.fixture
def hybrid_agent(filesystem_server, grep_server):
return Agent(
name="hybrid",
provider=Provider(model="azure/gpt-5-mini"),
mcp_servers=[filesystem_server],
cli_servers=[grep_server],
system_prompt="You can read/write files and search content.",
max_turns=10,
)
Troubleshooting¶
Command Not Found¶
Use the full path:
Help Discovery Fails¶
If you've enabled discover_help=True and it fails, either use a custom description or leave discovery disabled and let the LLM discover help itself:
# Option 1: Provide custom description
CLIServer(
name="my-cli",
command="my-tool",
tool_prefix="tool",
description="Tool for managing resources. Run --help for usage.",
)
# Option 2: Let LLM discover (default behavior)
CLIServer(
name="my-cli",
command="my-tool",
tool_prefix="tool",
)
Working Directory Issues¶
Use absolute paths:
from pathlib import Path
CLIServer(
name="my-cli",
command="my-tool",
tool_prefix="tool",
cwd=str(Path(__file__).parent / "workspace"),
)
📁 Real Example: test_cli_server.py — CLI server testing with ls and cat commands