pytest-aitest - Test Report

Tool	Status	Calls	Issues
get_balance	✅	Many	Working well
get_all_balances	✅	Many	Working well
transfer	✅	Many	Clear error on insufficient funds enabled good recovery
get_transactions	✅	Few	Returned empty data cleanly
deposit	⚠️	0	Not exercised by tests
withdraw	⚠️	0	Not exercised by tests

#	Optimization	Priority	Estimated Savings
1	Limit optional tool calls in advisory prompts	recommended	10–15% cost reduction
2	Trim verbose prompt variants	suggestion	10% fewer tokens
3	Add tests for deposit/withdraw	info	Coverage improvement

	Agent	Tests	Pass Rate	Tokens	Cost	Duration
🥇	gpt-4.1	10/10	100%	10,577	N/A	58.8s
🥈	gpt-5-mini	10/10	100%	15,318	N/A	1277.4s
🥉	gpt-4.1 + concise	2/2	100%	1,661	N/A	11.7s
4	gpt-4.1 + detailed	2/2	100%	2,291	N/A	14.5s
5	gpt-4.1 + financial-advisor	2/2	100%	1,750	N/A	19.4s
6	gpt-4.1 + friendly	2/2	100%	2,083	N/A	10.9s
7	gpt-5-mini + concise	2/2	100%	3,937	N/A	25.6s
8	gpt-5-mini + detailed	2/2	100%	6,812	N/A	46.3s
9	gpt-5-mini + financial-advisor	2/2	100%	5,678	N/A	1240.9s
10	gpt-5-mini + friendly	2/2	100%	5,242	N/A	45.3s

📋 Test Results

36 / 36 tests

🔗Core banking tests — parametrized across benchmark agents.(3 tests)

3/3

▼

✅Check account balance.

Total 4,047 tok·Total N/A·Δ +18%·Δ +44%·Δ +0%

gpt-4.1:✅8.8s

gpt-5-mini:✅12.7s

✅Transfer funds between accounts.

Total 4,762 tok·Total N/A·Δ +28%·Δ +36%·Δ +0%

gpt-4.1:✅8.4s

gpt-5-mini:✅11.5s

✅Handle insufficient funds gracefully.

Total 6,960 tok·Total N/A·Δ +155%·Δ +5135%·Δ +0%

gpt-4.1:✅23.6s

gpt-5-mini:✅1232.9s

gpt-4.1

passed

23.55s

duration

turns

tools

1,960

tokens

N/A

cost

2/2

iterations passed

100%

iter pass rate

📊Sequence

sequenceDiagram participant User participant Agent participant Tools User->>Agent: "Transfer $50,000 from my checking to savings." Agent->>Tools: "transfer({'from_account': 'checking', 'to_account': 'savings', 'amoun)" Tools-->>Agent: "Error: Insufficient funds. checking has $1,500.00, but trans" Agent->>User: "The transfer could not be completed because your checking account has $1,500, wh"

🔧Tool Calls

✅transfer

✓Assertions

✅llmexplains insufficient funds or suggests an alternative

💬Response

The transfer could not be completed because your checking account has $1,500, which is less than the requested $50,000.

Would you like to transfer a smaller amount, such as $1,500, or specify a different amount for the transfer?

gpt-5-mini

passed

1232.94s

duration

turns

tools

5,000

tokens

N/A

cost

2/2

iterations passed

100%

iter pass rate

📊Sequence

sequenceDiagram participant User participant Agent participant Tools User->>Agent: "Transfer $50,000 from my checking to savings." Agent->>Tools: "transfer({'from_account': 'checking', 'to_account': 'savings', 'amoun)" Tools-->>Agent: "Error: Insufficient funds. checking has $1,500.00, but trans" Agent->>Tools: "get_all_balances({})" Tools-->>Agent: "{'accounts': {'checking': {'balance': 1500.0, 'formatted': '" Agent->>User: "I attempted the $50,000 transfer but it failed due to insufficient funds. Curren"

🔧Tool Calls

✅transfer

✅get_all_balances

✓Assertions

✅llmexplains insufficient funds or suggests an alternative

💬Response

I attempted the $50,000 transfer but it failed due to insufficient funds.

Current balances:
- Checking: $1,500.00
- Savings: $3,000.00

Options I can do for you now:
- Transfer up to $1,500.00 from checking to savings (I can move the full available balance).
- Transfer a smaller specific amount (tell me how much).
- Deposit funds into checking first (tell me the source/amount) and then I can retry the $50,000 transfer.
- Or I can hold off — tell me how you’d like to proceed.

Which would you like me to do?

🔗Multi-turn session: savings transfer workflow.(2 tests)

2/2

▼

✅First turn: check account balances.

Total 4,340 tok·Total N/A·Δ +19%·Δ +16%·Δ +0%

gpt-4.1:✅10.6s

gpt-5-mini:✅9.1s

✅Second turn: transfer based on previous context.

Total 5,786 tok·Total N/A·Δ +15%·Δ +50%·Δ +0%

gpt-4.1:✅7.4s

gpt-5-mini:✅11.1s

gpt-4.1

passed

7.40s

duration

turns

tools

2,689

tokens

N/A

cost

2/2

iterations passed

100%

iter pass rate

📊Sequence

sequenceDiagram participant User participant Agent participant Tools User->>Agent: "Show me all my account balances." Agent->>Tools: "get_all_balances({})" Tools-->>Agent: "{'accounts': {'checking': {'balance': 1500.0, 'formatted': '" Agent->>User: "Here are your current account balances: - Checking: $1,500.00 - Savings: $3,000." User->>Agent: "Move $200 from checking to savings." Agent->>Tools: "transfer({'from_account': 'checking', 'to_account': 'savings', 'amoun)" Tools-->>Agent: "{'transaction_id': 'TX0001', 'type': 'transfer', 'from_accou" Agent->>User: "$200.00 has been successfully transferred from your checking account to your sav"

🔧Tool Calls

✅get_all_balances

✅transfer

💬Response

$200.00 has been successfully transferred from your checking account to your savings account.

Updated balances:
- Checking: $1,300.00
- Savings: $3,200.00

Let me know if you need anything else!

gpt-5-mini

passed

11.14s

duration

turns

tools

3,097

tokens

N/A

cost

2/2

iterations passed

100%

iter pass rate

📊Sequence

sequenceDiagram participant User participant Agent participant Tools User->>Agent: "Show me all my account balances." Agent->>Tools: "get_all_balances({})" Tools-->>Agent: "{'accounts': {'checking': {'balance': 1500.0, 'formatted': '" Agent->>User: "Here are your current account balances: - checking: $1,500.00 - savings: $3,000." User->>Agent: "Move $200 from checking to savings." Agent->>Tools: "transfer({'from_account': 'checking', 'to_account': 'savings', 'amoun)" Tools-->>Agent: "{'transaction_id': 'TX0001', 'type': 'transfer', 'from_accou" Agent->>User: "Done — I transferred $200.00 from checking to savings. - Checking new balance: $"

🔧Tool Calls

✅get_all_balances

✅transfer

💬Response

Done — I transferred $200.00 from checking to savings.

- Checking new balance: $1,300.00
- Savings new balance: $3,200.00

Transaction ID: TX0001

Anything else I can help with?

📋Compare how different prompt styles affect responses.(1 tests)

0/0

▼

✅Compare advisory styles across prompts.

gpt-4.1:—

gpt-5-mini:—

gpt-4.1

No result for this agent

gpt-5-mini

No result for this agent

gpt-4.1 + detailed

passed

14.49s

duration

turns

tools

2,291

tokens

N/A

cost

2/2

iterations passed

100%

iter pass rate

📊Sequence

sequenceDiagram participant User participant Agent participant Tools User->>Agent: "Check my accounts and give me advice on managing my money better." Agent->>Tools: "get_all_balances({})" Tools-->>Agent: "{'accounts': {'checking': {'balance': 1500.0, 'formatted': '" Agent->>User: "Here is an overview of your current account balances: - Checking account: $1,500"

🔧Tool Calls

✅get_all_balances

💬Response

Here is an overview of your current account balances:

- Checking account: $1,500
- Savings account: $3,000
- Total: $4,500

To offer tailored advice, here are a few perspectives and questions to consider:

1. Emergency Fund:
A best practice is to have 3–6 months’ worth of essential expenses in a savings account as an emergency fund. For example, if your monthly expenses are $2,000, having $6,000–$12,000 in savings is ideal. Your $3,000 savings balance may or may not meet this threshold depending on your needs.

2. Spending & Budgeting:
If you're keeping most of your spending money in checking, ensure you aren’t holding too much cash there—excess idle cash doesn’t earn interest. Consider setting up a monthly budget to track expenses and identify areas for saving.

3. Savings Growth:
With $3,000 in savings, think about your short-term and long-term goals. Are you saving for something specific (like a vacation, car, or down payment)? For long-term growth, consider moving excess funds into higher-yield accounts or investment options, if you’re comfortable with some risk.

4. Diversification & Financial Goals:
If you have only checking and savings, you might miss out on opportunities to grow your wealth, such as retirement accounts (401(k), IRA), certificates of deposit (CDs), or brokerage accounts for investments.

5. Example Scenario:
Let’s say you regularly spend $1,000–$1,200 each month. Keeping about $1,500 in checking is reasonable to avoid overdraft, but any amount above your typical spending cushion could be transferred to savings or investment accounts to optimize returns.

Would you like to:
- Review your recent transactions to identify patterns or opportunities for saving?
- Get help with setting up a monthly budget or saving plan?
- Explore investment options for your excess funds?
Let me know about your monthly income, expenses, and goals for more tailored recommendations.

gpt-5-mini + detailed

passed

46.32s

duration

turns

tools

6,812

tokens

N/A

cost

2/2

iterations passed

100%

iter pass rate

📊Sequence

sequenceDiagram participant User participant Agent participant Tools User->>Agent: "Check my accounts and give me advice on managing my money better." Agent->>Tools: "get_all_balances({})" Tools-->>Agent: "{'accounts': {'checking': {'balance': 1500.0, 'formatted': '" Agent->>Tools: "get_transactions({'limit': 20})" Tools-->>Agent: "{'transactions': [], 'count': 0, 'filter': null}" Agent->>User: "Thanks — I checked your accounts. Right now you have: - Checking: $1,500 - Savin"

🔧Tool Calls

✅get_all_balances

✅get_transactions

💬Response

Thanks — I checked your accounts. Right now you have:
- Checking: $1,500
- Savings: $3,000
- Total cash across these accounts: $4,500

I didn’t receive any recent transaction history, so I don’t yet know your monthly income, spending, or recurring payments. That information will let me give more tailored recommendations. Below I’ll explain practical next steps you can take now plus questions I need to give targeted advice.

Immediate observations and why they matter
- Total cash of $4,500 is a useful starting point, but whether that’s “enough” depends on your monthly expenses, debt, and goals.
- Your checking balance looks sufficient for short-term spending, but keeping too much cash in checking typically earns little or no interest. Savings is better for holding medium-term cash, but you may want higher-yield options.
- Without transaction data I can’t see recurring drains (subscriptions, high bills) or opportunities to cut spending or accelerate savings.

Priority recommendations (with reasoning and example numbers)

1) Establish (or confirm) an emergency fund
- Why: Protects you from income shocks (job loss, medical bills, car repairs) and avoids forced high-interest borrowing.
- Target: 3–6 months of essential living expenses (more if you have variable income, dependents, or high-risk job). Example: if your monthly expenses are $2,000 → 3 months = $6,000; you currently have $4,500, so you’d need another $1,500 to hit 3 months.
- Action: Calculate your essential monthly expenses (rent/mortgage, utilities, food, insurance, minimum debt payments). Tell me that number and I’ll compute the target for you.

2) Keep a short-term cash buffer in checking (cash-flow safety)
- Why: Avoid overdrafts and missed payments.
- Rule of thumb: Keep 1 month of typical spending in checking for day-to-day transactions. Move excess into a savings vehicle.
- Example: If your monthly spend is $2,000, keep $2,000 in checking and move the rest.

3) Use higher-yield short-term savings for the rest of your cash
- Why: Interest rates on many online/high-yield savings accounts or money market accounts are meaningfully higher than traditional checking.
- Action: If you’re keeping 1 month in checking, move remaining cash to a high-yield savings account. If you’re saving for a near-term purchase (6–24 months), consider
  - a high-yield savings account or money market fund, or
  - a short CD ladder if you want slightly higher yield with some liquidity trade-offs.

4) Prioritize high-interest debt (if any)
- Why: Debt with interest rates above ~6–8% usually costs more than expected investment returns.
- Action: If you have credit card or other high-rate debt, pay it down aggressively. If you have low-interest debt (e.g., mortgage, some student loans), you might balance paying it vs. investing.

5) Start/accelerate retirement contributions
- Why: Long-term investing (401(k), IRA) is typically the most powerful way to build wealth due to compound returns and tax advantages.
- Action:
  - If you have an employer 401(k) with match: contribute at least up to the match immediately.
  - If you don’t have a retirement account: open a Roth IRA or Traditional IRA (depending on tax situation). For many people a consistent monthly contribution of even $100–$300 grows significantly over time.
- Example: If you could direct $300/month into a Roth IRA, over 30 years at 7% annualized return that becomes ~ $300 * ( (1.07^360 - 1)/0.07 ) ≈ sizable sum (I can calculate exactly if you want).

6) Build a monthly budget and automate it
- Why: Automation reduces friction and enforces discipline; budgeting helps you see where cuts or reallocations are possible.
- Framework options:
  - 50/30/20 rule: 50% needs / 30% wants / 20% savings & debt. Useful quick check.
  - Zero-based budget: assign every dollar a job (good if cash is tight).
- Automation: Set up automatic transfers from checking to savings and to investment accounts on paydays.

7) Trim recurring expenses and review subscriptions
- Why: Small recurring charges add up. Cancelling unused subscriptions can free up tens or hundreds per month.
- Action: Provide or allow access to transaction history and I’ll scan for recurring payments to consider canceling or renegotiating.

8) Consider investing extra cash once emergency fund is in place
- Why: Long-term growth beat cash yields after you’ve secured short-term safety.
- Action: After emergency fund target is met, funnel surplus into:
  - Employer 401(k) up to match, then
  - Roth IRA (if eligible), then
  - Taxable brokerage account for taxable-efficient investing.
- Allocation depends on age and risk tolerance; a common starter allocation is 70–80% equities / 20–30% bonds for long time horizons, shifting more conservative as you near goals.

9) Taxes and accounts
- Why: Tax-advantaged accounts reduce your long-term tax drag.
- Action: If you’re self-employed, consider SEP IRA or Solo 401(k). If you expect low income now and higher later, Roth contributions can be favorable.

Concrete example action plan you could implement in the next 30 days
1. Tell me your monthly take-home pay and total monthly essential expenses (rent, utilities, food, insurance, minimum debt).
2. If you don’t want to share that, a low-effort default: keep $1,500 in checking (your current balance) as a buffer, then move $1,500 from checking to savings so your savings becomes $4,500 and checking $0? — don’t do that unless you know your monthly bills. Safer: move $500–1,000 from checking to a high-yield savings account.
3. If you have no retirement contributions or an employer match available, prioritize setting that up. Even a small automatic monthly contribution is powerful.
4. Review last 2–3 months of transactions (I can fetch them if you allow) to identify recurring charges and one-off large expenses.

Questions so I can tailor this:
- What is your monthly take-home (net) income?
- What are your typical monthly essential expenses?
- Do you have any outstanding debt balances and interest rates (credit card, student loan, auto, mortgage)?
- Do you get an employer 401(k) match? Are you currently contributing?
- Short/medium/long-term goals (house down payment, new car, vacation, retirement age)?
- Do you want me to run a sweep of recent transactions to identify recurring subscriptions and spending categories? (I didn’t get any transactions just now.)

If you want, I can:
- Pull recent transactions for category analysis (I have permission to do that if you confirm).
- Move a specific amount between your checking and savings right now (tell me amount and direction).
- Build a sample monthly budget based on your income/expenses.

Which of the questions above can you answer, and would you like me to scan transactions now?

gpt-5-mini + friendly

passed

45.34s

duration

turns

tools

5,242

tokens

N/A

cost

2/2

iterations passed

100%

iter pass rate

📊Sequence

sequenceDiagram participant User participant Agent participant Tools User->>Agent: "Check my accounts and give me advice on managing my money better." Agent->>Tools: "get_all_balances({})" Tools-->>Agent: "{'accounts': {'checking': {'balance': 1500.0, 'formatted': '" Agent->>User: "Nice work — you’ve already built a meaningful cash cushion. Right now you have: "

🔧Tool Calls

✅get_all_balances

💬Response

Nice work — you’ve already built a meaningful cash cushion. Right now you have:
- Checking: $1,500
- Savings: $3,000
- Total cash: $4,500

Quick, friendly assessment
- That’s a solid start for short-term stability. Celebrate that progress!
- Next question: do you have any high‑interest debt (credit cards, payday loans) or upcoming big expenses? That will change priorities.

Priority framework (simple and actionable)
1. Emergency fund: Aim for 3–6 months of essential expenses. If you don’t know your monthly essentials, we can find that from your transactions. Keep this in an easy-access, high‑yield savings account.
2. Short-term cash needs / bills buffer: Keep about 1 month of bills in checking so you don’t overdraft between paychecks.
3. High‑interest debt: If you have debt > ~7% APR, focus on paying that down next.
4. Retirement & investing: Once emergency fund + debt are handled, funnel extra into retirement accounts (get the employer match first) or a taxable investment account for longer-term goals.
5. Sinking funds: Create separate buckets for things like car repairs, gifts, travel so big costs don’t derail your budget.

Suggested next steps (you can act on these today)
- Confirm monthly essential expenses (rent/mortgage, utilities, groceries, transportation). If you want, I can pull your recent transactions to calculate them — shall I?
- If you don’t have 3 months of expenses yet, consider moving extra from checking to savings until you reach that target. Keep enough in checking for one month of bills.
- Automate savings: set up a recurring transfer (e.g., right after payday) to your savings or investment account — automation wins.
- Open or move savings into a high‑yield savings account if you’re not already earning a competitive rate.

Money-saving tips you can try this month
- Track subscriptions and cancel ones you don’t use.
- Meal plan and batch-cook to lower eating out costs.
- Compare insurance and phone plans yearly to find savings.
- Use automatic round-up or micro-investing apps to save small amounts painlessly.
- Negotiate recurring bills (internet, cable) or call to ask for lower rates.
- Use 0% intro offers and balance transfers only if you can pay within the promotional time — otherwise avoid.

How I can help next
- I can analyze your last 10–30 transactions to build a budget and find quick wins (ask and I’ll pull them).
- I can suggest a specific transfer (how much to move from checking → savings) and execute it for you if you want.
- I can help set up a monthly plan (exact savings amount, target dates for goals, and investment suggestions).

You’re off to a great start — tell me what your monthly expenses look like and any goals/debts, and I’ll make a personalized, practical plan you can start this week.

📋Test how skills improve advice quality.(1 tests)

0/0

▼

✅Agent with financial advisor skill gives better advice.

gpt-4.1:—

gpt-5-mini:—

gpt-4.1

No result for this agent

gpt-5-mini

No result for this agent

gpt-5-mini + financial-advisor

passed

1240.93s

duration

turns

tools

5,678

tokens

N/A

cost

2/2

iterations passed

100%

iter pass rate

📊Sequence

sequenceDiagram participant User participant Agent participant Tools User->>Agent: "I have $1500 in checking. Should I keep it there or move some to savings? What's" Agent->>Tools: "get_all_balances({})" Tools-->>Agent: "{'accounts': {'checking': {'balance': 1500.0, 'formatted': '" Agent->>User: "Thanks — I checked your accounts first. Right now you have: - Checking: $1,500 -"

🔧Tool Calls

✅get_all_balances

✓Assertions

✅llmprovides financial advice about savings or emergency funds

💬Response

Thanks — I checked your accounts first. Right now you have:
- Checking: $1,500
- Savings: $3,000
- Total cash: $4,500

Quick takeaways and recommendation

1) Emergency fund target
- The rule: aim for 3–6 months of your essential (not total) monthly expenses. Essentials = housing, utilities, groceries, transportation, insurance, and minimum debt payments.
- Example targets:
  - If essentials are $1,000/month → EF = $3,000–$6,000
  - If essentials are $2,000/month → EF = $6,000–$12,000
- Right now you have $3,000 in savings. Whether that’s enough depends on your actual monthly essentials.

2) Should you move money from checking to savings?
- Keep a checking buffer for upcoming bills and accidental shortfalls. Common buffers:
  - 1 month of essentials, or
  - A minimum “float” of $500–$1,000 if your bills are irregular.
- If your essential monthly spending is less than about $1,500 and you don’t have imminent bills, you could move part of the $1,500 checking balance to savings to build your EF faster.
- If you have high-interest debt (credit cards, payday), consider using some of the cash to pay that down before adding more to low-interest savings.

3) Practical next steps (actionable)
- Tell me:
  - Your approximate monthly take-home pay and your essential monthly expenses (rent/mortgage, utilities, groceries, transport, insurance, minimum debt payments).
  - Any high-interest debt and current retirement contributions (do you get an employer match?).
- Based on that I’ll calculate a 3–6 month EF target and recommend a precise transfer amount.
- If you want, I can move funds for you now — typical suggestion is keep $1,000 in checking (or one month of essentials) and transfer the rest to savings. Confirm if you want me to do that.

Other quick tips
- Automate a weekly or biweekly transfer to savings (“pay yourself first”).
- If employer matches retirement, prioritize contributing enough to get the full match after you’ve secured an initial EF or paid down high-interest debt.
- Avoid lifestyle creep as any income rises — increase savings rate first.

Tell me your monthly essentials (or whether you want me to keep $1,000 in checking and move the rest), and I’ll make the transfer for you or give a precise plan.

Core banking tests — parametrized across benchmark agents.

Comparative Analysis

Why the winner wins

Consistency (iterations only)

Notable patterns

Alternatives

🔧 MCP Tool Feedback

MCPServerStdio

📝 System Prompt Feedback

default (effective)

concise / detailed / friendly (effective, model-agnostic)

📚 Skill Feedback

financial-advisor (positive)

💡 Optimizations

1. Limit optional tool calls in advisory prompts (recommended)

2. Trim verbose prompt variants (suggestion)

3. Add tests for deposit/withdraw (info)

🏆 Agent Leaderboard

📋 Test Results