API Reference¶

Auto-generated API documentation from source code.

Core Types¶

`Agent(provider: Provider, name: str = '', id: str = (lambda: str(uuid.uuid4()))(), mcp_servers: list[MCPServer] = list(), cli_servers: list[CLIServer] = list(), system_prompt: str | None = None, max_turns: int = 10, skill: Skill | None = None, allowed_tools: list[str] | None = None, system_prompt_name: str | None = None, retries: int = 1, clarification_detection: ClarificationDetection = ClarificationDetection())` `dataclass` ¶

AI agent configuration combining provider and servers.

The Agent is the unit of comparison in pytest-aitest. Each agent has a unique id (auto-generated UUID) that flows through the entire pipeline — from test execution to report rendering.

Define agents at module level and parametrize tests with them so the same Agent object (same UUID) is reused across tests:

Example

Agent( name="banking-fast", provider=Provider(model="azure/gpt-5-mini"), mcp_servers=[banking_server], system_prompt="Be concise.", )

Comparing agents

agents = [agent_fast, agent_smart, agent_expert]

@pytest.mark.parametrize("agent", agents, ids=lambda a: a.name) async def test_query(aitest_run, agent): result = await aitest_run(agent, "What's my balance?")

Filtering tools

Agent( provider=Provider(model="azure/gpt-5-mini"), mcp_servers=[excel_server], allowed_tools=["read_cell", "write_cell"], # Only expose these tools )

`__post_init__() -> None` ¶

Auto-construct name from dimensions if not explicitly set.

`Provider(model: str, temperature: float | None = None, max_tokens: int | None = None, rpm: int | None = None, tpm: int | None = None)` `dataclass` ¶

LLM provider configuration.

Authentication is handled via standard environment variables: - Azure: AZURE_API_BASE + az login (Entra ID) - OpenAI: OPENAI_API_KEY - Anthropic: ANTHROPIC_API_KEY

See https://ai.pydantic.dev/models/ for supported providers.

Example

Provider(model="openai/gpt-4o-mini") Provider(model="azure/gpt-5-mini", temperature=0.7) Provider(model="azure/gpt-5-mini", rpm=10, tpm=10000)

`MCPServer(command: list[str] = list(), args: list[str] = list(), env: dict[str, str] = dict(), wait: Wait = Wait.ready(), cwd: str | None = None, transport: Transport = 'stdio', url: str | None = None, headers: dict[str, str] = dict())` `dataclass` ¶

MCP server configuration.

Supports three transports:

stdio (default) — Launches a local subprocess and communicates via stdin/stdout. Requires command.

sse — Connects to a remote server using Server-Sent Events. Requires url.

streamable-http — Connects to a remote server using the Streamable HTTP transport (recommended for production). Requires url.

Example

stdio (default)¶

MCPServer( command=["npx", "-y", "@modelcontextprotocol/server-filesystem"], args=["--directory", "/workspace"], )

SSE remote server¶

MCPServer( transport="sse", url="http://localhost:8000/sse", )

Streamable HTTP remote server¶

MCPServer( transport="streamable-http", url="http://localhost:8000/mcp", )

With custom headers (e.g. auth)¶

MCPServer( transport="streamable-http", url="https://mcp.example.com/mcp", headers={"Authorization": "Bearer ${MCP_TOKEN}"}, )

`CLIServer(name: str, command: str, tool_prefix: str | None = None, shell: str | None = None, cwd: str | None = None, env: dict[str, str] = dict(), discover_help: bool = False, help_flag: str = '--help', description: str | None = None, timeout: float = 30.0)` `dataclass` ¶

CLI server that wraps a command-line tool as an MCP-like tool.

Wraps a single CLI command (like git, docker, echo) and exposes it as a tool the LLM can call with arbitrary arguments.

By default, help discovery is DISABLED. The LLM must run command --help itself to discover available subcommands. This tests that your skill/prompt properly instructs the LLM to discover CLI capabilities.

Example

CLIServer( name="git-cli", command="git", tool_prefix="git", # Creates "git_execute" tool shell="bash", # Shell to use (default: auto-detect) )

Enable auto-discovery (pre-populates tool description with help output)¶

CLIServer( name="my-cli", command="my-tool", discover_help=True, # Runs --help and includes in tool description )

Custom description instead of discovery¶

CLIServer( name="legacy-cli", command="old-tool", description="Manages legacy data. Use: list, get , delete ", )

The generated tool accepts an args parameter: git_execute(args="status --porcelain") git_execute(args="log -n 5 --oneline")

`Wait(strategy: WaitStrategy, pattern: str | None = None, tools: tuple[str, ...] | None = None, timeout_ms: int = 30000)` `dataclass` ¶

Wait configuration for server startup.

Example

Wait.for_log("Server started") Wait.for_tools(["read_file", "write_file"]) Wait.ready()

`ready(timeout_ms: int = 30000) -> Wait` `classmethod` ¶

Wait for server to signal ready (default).

`for_log(pattern: str, timeout_ms: int = 30000) -> Wait` `classmethod` ¶

Wait for specific log pattern in stderr.

`for_tools(tools: Sequence[str], timeout_ms: int = 30000) -> Wait` `classmethod` ¶

Wait until specific tools are available.

`ClarificationDetection(enabled: bool = False, level: ClarificationLevel = ClarificationLevel.WARNING, judge_model: str | None = None)` `dataclass` ¶

Configuration for detecting when an agent asks for clarification.

When enabled, uses a judge LLM to detect if the agent is asking the user for clarification (e.g., "Would you like me to...?") instead of executing the requested task. This is important because agents being tested should act autonomously, not ask questions.

The judge model can be the same as the agent's model (default) or a separate, cheaper model.

Example

Use agent's own model as judge¶

Agent( provider=Provider(model="azure/gpt-5-mini"), clarification_detection=ClarificationDetection(enabled=True), )

Use a separate cheaper model as judge¶

Agent( provider=Provider(model="azure/gpt-4.1"), clarification_detection=ClarificationDetection( enabled=True, level=ClarificationLevel.ERROR, judge_model="azure/gpt-5-mini", ), )

`ClarificationLevel` ¶

Bases: Enum

Severity level when clarification is detected.

Result Types¶

`AgentResult(turns: list[Turn], success: bool, error: str | None = None, duration_ms: float = 0.0, token_usage: dict[str, int] = dict(), cost_usd: float = 0.0, _messages: list[Any] = list(), session_context_count: int = 0, assertions: list[Assertion] = list(), available_tools: list[ToolInfo] = list(), skill_info: SkillInfo | None = None, effective_system_prompt: str = '', clarification_stats: ClarificationStats | None = None)` `dataclass` ¶

Result of running an agent with rich inspection capabilities.

Example

result = await aitest_run(agent, "Hello!") assert result.success assert "hello" in result.final_response.lower() assert result.tool_was_called("read_file")

Session continuity: pass messages to next test¶

next_result = await aitest_run(agent, "Follow up", messages=result.messages)

`messages: list[Any]` `property` ¶

Get full conversation messages for session continuity.

Use this to pass conversation history to the next test in a session:

result = await aitest_run(agent, "First message")
next_result = await aitest_run(agent, "Continue", messages=result.messages)

`is_session_continuation: bool` `property` ¶

Check if this result is part of a multi-turn session.

Returns True if prior messages were passed via the messages parameter.

`final_response: str` `property` ¶

Get the last assistant response.

`all_responses: list[str]` `property` ¶

Get all assistant responses.

`all_tool_calls: list[ToolCall]` `property` ¶

Get all tool calls across all turns.

`tool_names_called: set[str]` `property` ¶

Get set of all tool names that were called.

`asked_for_clarification: bool` `property` ¶

Check if the agent asked for clarification instead of acting.

Returns True if clarification detection was enabled AND the agent asked at least one clarifying question.

Example

result = await aitest_run(agent, "Check my balance") assert not result.asked_for_clarification

`clarification_count: int` `property` ¶

Number of times the agent asked for clarification.

`tool_context: str` `property` ¶

Summarise tool calls and their results as plain text.

Use this as the context argument for llm_score so the judge can see what tools were called and what data they returned.

Example::

score = llm_score(
    result.final_response,
    TOOL_QUALITY_RUBRIC,
    context=result.tool_context,
)

`tool_was_called(name: str) -> bool` ¶

Check if a specific tool was called.

`tool_call_count(name: str) -> int` ¶

Count how many times a specific tool was called.

`tool_calls_for(name: str) -> list[ToolCall]` ¶

Get all calls to a specific tool.

`tool_call_arg(tool_name: str, arg_name: str) -> Any` ¶

Get argument value from the first call to a tool.

Parameters:

Name	Type	Description	Default
`tool_name`	`str`	Name of the tool	required
`arg_name`	`str`	Name of the argument	required

Returns:

Type	Description
`Any`	Argument value or None if not found

`tool_images_for(name: str) -> list[ImageContent]` ¶

Get all images returned by a specific tool.

Parameters:

Name	Type	Description	Default
`name`	`str`	Name of the tool (e.g., "screenshot")	required

Returns:

Type	Description
`list[ImageContent]`	List of ImageContent objects from tool calls that returned images.

Example

screenshots = result.tool_images_for("screenshot") assert len(screenshots) > 0 assert screenshots[-1].media_type == "image/png"

`Turn(role: str, content: str, tool_calls: list[ToolCall] = list())` `dataclass` ¶

A single conversational turn.

`text: str` `property` ¶

Get the text content of this turn.

`ToolCall(name: str, arguments: dict[str, Any], result: str | None = None, error: str | None = None, duration_ms: float | None = None, image_content: bytes | None = None, image_media_type: str | None = None)` `dataclass` ¶

A tool call made by the agent.

`ClarificationStats(count: int = 0, turn_indices: list[int] = list(), examples: list[str] = list())` `dataclass` ¶

Statistics about clarification requests detected during execution.

Tracks when the agent asks for user input instead of executing the task. Only populated when clarification_detection is enabled on the agent.

Example

result = await aitest_run(agent, "Check my balance") if result.clarification_stats: print(f"Agent asked {result.clarification_stats.count} question(s)")

`ToolInfo(name: str, description: str, input_schema: dict[str, Any], server_name: str)` `dataclass` ¶

Metadata about an MCP tool for AI analysis.

Captures the tool's description and schema as exposed to the LLM, enabling the AI to analyze whether tool descriptions are clear and suggest improvements.

`SkillInfo(name: str, description: str, instruction_content: str, reference_names: list[str] = list())` `dataclass` ¶

Metadata about a skill for AI analysis.

Captures the skill's instruction content and references, enabling the AI to analyze skill effectiveness and suggest improvements.

`SubagentInvocation(name: str, status: str, duration_ms: float | None = None)` `dataclass` ¶

A subagent invocation observed during agent execution.

Tracks when an orchestrator agent dispatches work to a named sub-agent, along with the final status and duration of that invocation.

Example

result = await copilot_run(agent, "Build and test the project") assert any(s.name == "coder" for s in result.subagent_invocations) assert all(s.status == "completed" for s in result.subagent_invocations)

Scoring Types¶

options:
  show_source: false
  heading_level: 3

`ScoreResult(scores: dict[str, int], total: int, max_total: int, weighted_score: float, reasoning: str)` `dataclass` ¶

Structured result from a multi-dimension LLM evaluation.

Attributes:

Name	Type	Description
`scores`	`dict[str, int]`	Per-dimension scores keyed by dimension name.
`total`	`int`	Sum of all dimension scores.
`max_total`	`int`	Maximum possible total score.
`weighted_score`	`float`	Weighted composite score (0.0 – 1.0).
`reasoning`	`str`	Free-text explanation from the judge.

`assert_score(result: ScoreResult, *, min_total: int | None = None, min_pct: float | None = None, min_dimensions: dict[str, int] | None = None) -> None` ¶

Assert that judge scores meet minimum thresholds.

Parameters:

Name	Type	Description	Default
`result`	`ScoreResult`	ScoreResult from an LLMScore evaluation.	required
`min_total`	`int \| None`	Minimum total score (sum of all dimensions).	`None`
`min_pct`	`float \| None`	Minimum weighted percentage (0.0 – 1.0).	`None`
`min_dimensions`	`dict[str, int] \| None`	Per-dimension minimum scores keyed by name.	`None`

Raises:

Type	Description
`AssertionError`	If any threshold is not met.

`LLMScore(model: Any)` ¶

Callable that evaluates content against a multi-dimension rubric.

Uses pydantic-ai with structured output to extract per-dimension scores from a judge LLM.

Example::

def test_plan_quality(llm_score):
    rubric = [
        ScoringDimension("accuracy", "Factually correct", max_score=5),
        ScoringDimension("completeness", "Covers all points", max_score=5),
    ]
    result = llm_score(plan_text, rubric)
    assert result.total >= 7

`call(content: str, rubric: list[ScoringDimension], *, content_label: str = 'content', context: str | None = None) -> ScoreResult` ¶

Evaluate content against a multi-dimension rubric.

Parameters:

Name	Type	Description	Default
`content`	`str`	The text to evaluate.	required
`rubric`	`list[ScoringDimension]`	List of ScoringDimension definitions.	required
`content_label`	`str`	How to describe the content to the judge (e.g. `"implementation plan"`, `"code review"`).	`'content'`
`context`	`str \| None`	Optional background context for the judge (e.g. the original task prompt, source code).	`None`

Returns:

Type	Description
`ScoreResult`	ScoreResult with per-dimension scores and reasoning.

`async_score(content: str, rubric: list[ScoringDimension], *, content_label: str = 'content', context: str | None = None) -> ScoreResult` `async` ¶

Async variant for use in async test functions.

Same parameters as __call__.

Skill Types¶

`Skill(path: Path, metadata: SkillMetadata, content: str, references: dict[str, str] = dict())` `dataclass` ¶

An Agent Skill loaded from a SKILL.md file.

Skills provide domain knowledge to agents by: 1. Prepending instructions to the system prompt 2. Optionally providing reference documents via virtual tools

Example

skill = Skill.from_path(Path("skills/my-skill")) agent = Agent(provider=provider, skill=skill)

`name: str` `property` ¶

Skill name from metadata.

`description: str` `property` ¶

Skill description from metadata.

`has_references: bool` `property` ¶

Whether this skill has reference documents.

`from_path(path: Path | str) -> Skill` `classmethod` ¶

Load a skill from a directory containing SKILL.md.

Parameters:

Name	Type	Description	Default
`path`	`Path \| str`	Path to skill directory or SKILL.md file	required

Returns:

Type	Description
`Skill`	Loaded Skill instance

Raises:

Type	Description
`SkillError`	If skill cannot be loaded or is invalid

`SkillMetadata(name: str, description: str, version: str | None = None, license: str | None = None, tags: tuple[str, ...] = ())` `dataclass` ¶

Metadata from SKILL.md frontmatter.

Required fields per agentskills.io spec: - name: lowercase letters and hyphens only, 1-64 chars - description: what the skill does, max 1024 chars

Optional fields: - version: semantic version string - license: SPDX license identifier - tags: list of categorization tags

`__post_init__() -> None` ¶

Validate metadata per agentskills.io spec.

`load_skill(path: Path | str) -> Skill` ¶

Load a skill from a path.

Convenience function wrapping Skill.from_path().

Parameters:

Name	Type	Description	Default
`path`	`Path \| str`	Path to skill directory or SKILL.md file	required

Returns:

Type	Description
`Skill`	Loaded Skill instance

Prompt Types¶

`Prompt(name: str, system_prompt: str = '', version: str = '1.0', description: str = '', metadata: dict[str, Any] = dict())` `dataclass` ¶

A loadable prompt configuration.

Example YAML

name: banking-assistant version: "1.0" description: Concise banking responses system_prompt: | You are a banking assistant. Be brief and always include account balances.

Example usage

prompt = Prompt.from_yaml("prompts/banking.yaml") agent = Agent(system_prompt=prompt.system_prompt, ...)

`from_yaml(path: str | Path) -> Prompt` `classmethod` ¶

Load a prompt from a YAML file.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Path to YAML file	required

Returns:

Type	Description
`Prompt`	Prompt instance

Raises:

Type	Description
`FileNotFoundError`	If file doesn't exist
`ValueError`	If required fields are missing

`from_dict(data: dict[str, Any]) -> Prompt` `classmethod` ¶

Create a Prompt from a dictionary.

`load_system_prompts(directory: str | Path) -> dict[str, str]` ¶

Load all system prompts from a directory as a simple dict.

This is a convenience function for quick parametrization. For full Prompt metadata, use load_prompts() instead.

Parameters:

Name	Type	Description	Default
`directory`	`str \| Path`	Path to directory containing .yaml/.yml or .md files	required

Returns:

Type	Description
`dict[str, str]`	Dict mapping prompt name to system_prompt content

Example

prompts = load_system_prompts(Path("prompts/"))

{"concise": "Be brief...", "detailed": "Explain..."}¶

@pytest.mark.parametrize("prompt_name,system_prompt", prompts.items()) async def test_with_prompt(aitest_run, prompt_name, system_prompt): agent = Agent(system_prompt=system_prompt, ...)

`load_prompts(directory: str | Path) -> list[Prompt]` ¶

Load all prompt YAML files from a directory.

Parameters:

Name	Type	Description	Default
`directory`	`str \| Path`	Path to directory containing .yaml/.yml files	required

Returns:

Type	Description
`list[Prompt]`	List of Prompt instances sorted by name

Example

prompts = load_prompts("prompts/")

@pytest.mark.parametrize("prompt", prompts, ids=lambda p: p.name) async def test_prompts(aitest_run, prompt): agent = Agent(system_prompt=prompt.system_prompt, ...)

`load_prompt(path: str | Path) -> Prompt` ¶

Load a single prompt from a YAML file.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Path to YAML file	required

Returns:

Type	Description
`Prompt`	Prompt instance

Optimizer¶

`optimize_instruction(current_instruction: str, result: AgentResult, criterion: str, *, model: str | Model = 'azure/gpt-5.2-chat') -> InstructionSuggestion` `async` ¶

Analyze a result and suggest an improved instruction.

Uses pydantic-ai structured output to analyze the gap between a current instruction and the agent's observed behavior, returning a concrete, actionable improvement.

Designed to drop into pytest.fail() so the failure message contains a ready-to-use fix.

Model strings follow the same provider/model format used by pytest-aitest. Azure Entra ID auth is handled automatically when AZURE_API_BASE or AZURE_OPENAI_ENDPOINT is set.

Example::

result = await aitest_run(agent, task)
if '"""' not in result.file("main.py"):
    suggestion = await optimize_instruction(
        agent.system_prompt or "",
        result,
        "Agent should add docstrings to all functions.",
    )
    pytest.fail(f"No docstrings found.\n\n{suggestion}")

Parameters:

Name	Type	Description	Default
`current_instruction`	`str`	The agent's current instruction / system prompt text.	required
`result`	`AgentResult`	The :class:`~pytest_aitest.core.result.AgentResult` from the (failed) run.	required
`criterion`	`str`	What the agent should have done — the test expectation in plain English (e.g. `"Always write docstrings"`).	required
`model`	`str \| Model`	Provider/model string (e.g. `"azure/gpt-5.2-chat"`, `"openai/gpt-4o-mini"`) or a pre-configured pydantic-ai `Model` object. Defaults to `"azure/gpt-5.2-chat"`.	`'azure/gpt-5.2-chat'`

Returns:

Name	Type	Description
`An`	`InstructionSuggestion`	class:`InstructionSuggestion` with the improved instruction.

`InstructionSuggestion(instruction: str, reasoning: str, changes: str)` `dataclass` ¶

A suggested improvement to an agent instruction.

Returned by :func:optimize_instruction. Designed to drop into pytest.fail() so the failure message includes an actionable fix.

Attributes:

Name	Type	Description
`instruction`	`str`	The improved instruction text to use instead.
`reasoning`	`str`	Explanation of why this change would close the gap.
`changes`	`str`	Short description of what was changed (one sentence).

Example::

suggestion = await optimize_instruction(
    agent.system_prompt or "",
    result,
    "Agent should add docstrings to all functions.",
)
pytest.fail(f"No docstrings found.\n\n{suggestion}")

API Reference¶

Core Types¶

__post_init__() -> None ¶

Provider(model: str, temperature: float | None = None, max_tokens: int | None = None, rpm: int | None = None, tpm: int | None = None) dataclass ¶

MCPServer(command: list[str] = list(), args: list[str] = list(), env: dict[str, str] = dict(), wait: Wait = Wait.ready(), cwd: str | None = None, transport: Transport = 'stdio', url: str | None = None, headers: dict[str, str] = dict()) dataclass ¶

stdio (default)¶

SSE remote server¶

Streamable HTTP remote server¶

With custom headers (e.g. auth)¶

CLIServer(name: str, command: str, tool_prefix: str | None = None, shell: str | None = None, cwd: str | None = None, env: dict[str, str] = dict(), discover_help: bool = False, help_flag: str = '--help', description: str | None = None, timeout: float = 30.0) dataclass ¶

Enable auto-discovery (pre-populates tool description with help output)¶

Custom description instead of discovery¶

Wait(strategy: WaitStrategy, pattern: str | None = None, tools: tuple[str, ...] | None = None, timeout_ms: int = 30000) dataclass ¶

ready(timeout_ms: int = 30000) -> Wait classmethod ¶

for_log(pattern: str, timeout_ms: int = 30000) -> Wait classmethod ¶

for_tools(tools: Sequence[str], timeout_ms: int = 30000) -> Wait classmethod ¶

ClarificationDetection(enabled: bool = False, level: ClarificationLevel = ClarificationLevel.WARNING, judge_model: str | None = None) dataclass ¶

Use agent's own model as judge¶

Use a separate cheaper model as judge¶

ClarificationLevel ¶

Result Types¶

Session continuity: pass messages to next test¶

messages: list[Any] property ¶

is_session_continuation: bool property ¶

final_response: str property ¶

all_responses: list[str] property ¶

all_tool_calls: list[ToolCall] property ¶

tool_names_called: set[str] property ¶

asked_for_clarification: bool property ¶

clarification_count: int property ¶

tool_context: str property ¶

tool_was_called(name: str) -> bool ¶

tool_call_count(name: str) -> int ¶

tool_calls_for(name: str) -> list[ToolCall] ¶

tool_call_arg(tool_name: str, arg_name: str) -> Any ¶

tool_images_for(name: str) -> list[ImageContent] ¶

Turn(role: str, content: str, tool_calls: list[ToolCall] = list()) dataclass ¶

text: str property ¶

ToolCall(name: str, arguments: dict[str, Any], result: str | None = None, error: str | None = None, duration_ms: float | None = None, image_content: bytes | None = None, image_media_type: str | None = None) dataclass ¶

ClarificationStats(count: int = 0, turn_indices: list[int] = list(), examples: list[str] = list()) dataclass ¶

ToolInfo(name: str, description: str, input_schema: dict[str, Any], server_name: str) dataclass ¶

SkillInfo(name: str, description: str, instruction_content: str, reference_names: list[str] = list()) dataclass ¶

SubagentInvocation(name: str, status: str, duration_ms: float | None = None) dataclass ¶

Scoring Types¶

ScoreResult(scores: dict[str, int], total: int, max_total: int, weighted_score: float, reasoning: str) dataclass ¶

assert_score(result: ScoreResult, *, min_total: int | None = None, min_pct: float | None = None, min_dimensions: dict[str, int] | None = None) -> None ¶

LLMScore(model: Any) ¶

__call__(content: str, rubric: list[ScoringDimension], *, content_label: str = 'content', context: str | None = None) -> ScoreResult ¶

async_score(content: str, rubric: list[ScoringDimension], *, content_label: str = 'content', context: str | None = None) -> ScoreResult async ¶

Skill Types¶

Skill(path: Path, metadata: SkillMetadata, content: str, references: dict[str, str] = dict()) dataclass ¶

name: str property ¶

description: str property ¶

has_references: bool property ¶

from_path(path: Path | str) -> Skill classmethod ¶

SkillMetadata(name: str, description: str, version: str | None = None, license: str | None = None, tags: tuple[str, ...] = ()) dataclass ¶

__post_init__() -> None ¶

load_skill(path: Path | str) -> Skill ¶

Prompt Types¶

Prompt(name: str, system_prompt: str = '', version: str = '1.0', description: str = '', metadata: dict[str, Any] = dict()) dataclass ¶

from_yaml(path: str | Path) -> Prompt classmethod ¶

from_dict(data: dict[str, Any]) -> Prompt classmethod ¶

load_system_prompts(directory: str | Path) -> dict[str, str] ¶

{"concise": "Be brief...", "detailed": "Explain..."}¶

load_prompts(directory: str | Path) -> list[Prompt] ¶

load_prompt(path: str | Path) -> Prompt ¶

Optimizer¶

optimize_instruction(current_instruction: str, result: AgentResult, criterion: str, *, model: str | Model = 'azure/gpt-5.2-chat') -> InstructionSuggestion async ¶

InstructionSuggestion(instruction: str, reasoning: str, changes: str) dataclass ¶

`__post_init__() -> None` ¶

`Provider(model: str, temperature: float | None = None, max_tokens: int | None = None, rpm: int | None = None, tpm: int | None = None)` `dataclass` ¶

`MCPServer(command: list[str] = list(), args: list[str] = list(), env: dict[str, str] = dict(), wait: Wait = Wait.ready(), cwd: str | None = None, transport: Transport = 'stdio', url: str | None = None, headers: dict[str, str] = dict())` `dataclass` ¶

`CLIServer(name: str, command: str, tool_prefix: str | None = None, shell: str | None = None, cwd: str | None = None, env: dict[str, str] = dict(), discover_help: bool = False, help_flag: str = '--help', description: str | None = None, timeout: float = 30.0)` `dataclass` ¶

`Wait(strategy: WaitStrategy, pattern: str | None = None, tools: tuple[str, ...] | None = None, timeout_ms: int = 30000)` `dataclass` ¶

`ready(timeout_ms: int = 30000) -> Wait` `classmethod` ¶

`for_log(pattern: str, timeout_ms: int = 30000) -> Wait` `classmethod` ¶

`for_tools(tools: Sequence[str], timeout_ms: int = 30000) -> Wait` `classmethod` ¶

`ClarificationDetection(enabled: bool = False, level: ClarificationLevel = ClarificationLevel.WARNING, judge_model: str | None = None)` `dataclass` ¶

`ClarificationLevel` ¶

`messages: list[Any]` `property` ¶

`is_session_continuation: bool` `property` ¶

`final_response: str` `property` ¶

`all_responses: list[str]` `property` ¶

`all_tool_calls: list[ToolCall]` `property` ¶

`tool_names_called: set[str]` `property` ¶

`asked_for_clarification: bool` `property` ¶

`clarification_count: int` `property` ¶

`tool_context: str` `property` ¶

`tool_was_called(name: str) -> bool` ¶

`tool_call_count(name: str) -> int` ¶

`tool_calls_for(name: str) -> list[ToolCall]` ¶

`tool_call_arg(tool_name: str, arg_name: str) -> Any` ¶

`tool_images_for(name: str) -> list[ImageContent]` ¶

`Turn(role: str, content: str, tool_calls: list[ToolCall] = list())` `dataclass` ¶

`text: str` `property` ¶

`ToolCall(name: str, arguments: dict[str, Any], result: str | None = None, error: str | None = None, duration_ms: float | None = None, image_content: bytes | None = None, image_media_type: str | None = None)` `dataclass` ¶

`ClarificationStats(count: int = 0, turn_indices: list[int] = list(), examples: list[str] = list())` `dataclass` ¶

`ToolInfo(name: str, description: str, input_schema: dict[str, Any], server_name: str)` `dataclass` ¶

`SkillInfo(name: str, description: str, instruction_content: str, reference_names: list[str] = list())` `dataclass` ¶

`SubagentInvocation(name: str, status: str, duration_ms: float | None = None)` `dataclass` ¶

`ScoreResult(scores: dict[str, int], total: int, max_total: int, weighted_score: float, reasoning: str)` `dataclass` ¶

`assert_score(result: ScoreResult, *, min_total: int | None = None, min_pct: float | None = None, min_dimensions: dict[str, int] | None = None) -> None` ¶

`LLMScore(model: Any)` ¶

`call(content: str, rubric: list[ScoringDimension], *, content_label: str = 'content', context: str | None = None) -> ScoreResult` ¶

`async_score(content: str, rubric: list[ScoringDimension], *, content_label: str = 'content', context: str | None = None) -> ScoreResult` `async` ¶

`Skill(path: Path, metadata: SkillMetadata, content: str, references: dict[str, str] = dict())` `dataclass` ¶

`name: str` `property` ¶

`description: str` `property` ¶

`has_references: bool` `property` ¶

`from_path(path: Path | str) -> Skill` `classmethod` ¶

`SkillMetadata(name: str, description: str, version: str | None = None, license: str | None = None, tags: tuple[str, ...] = ())` `dataclass` ¶

`__post_init__() -> None` ¶

`load_skill(path: Path | str) -> Skill` ¶

`Prompt(name: str, system_prompt: str = '', version: str = '1.0', description: str = '', metadata: dict[str, Any] = dict())` `dataclass` ¶

`from_yaml(path: str | Path) -> Prompt` `classmethod` ¶

`from_dict(data: dict[str, Any]) -> Prompt` `classmethod` ¶

`load_system_prompts(directory: str | Path) -> dict[str, str]` ¶

`load_prompts(directory: str | Path) -> list[Prompt]` ¶

`load_prompt(path: str | Path) -> Prompt` ¶

`optimize_instruction(current_instruction: str, result: AgentResult, criterion: str, *, model: str | Model = 'azure/gpt-5.2-chat') -> InstructionSuggestion` `async` ¶

`InstructionSuggestion(instruction: str, reasoning: str, changes: str)` `dataclass` ¶