Skip to content

Benchmarks: MCP vs SLOP

SLOP and MCP solve different problems with different architectures. MCP is action-first — it exposes a flat registry of tools that an AI agent can call. SLOP is state-first — it exposes a semantic state tree with contextual affordances that change based on the application’s current state.

This page presents benchmark results from running identical tasks against both protocols using the same backing application (an issue tracker with repos, issues, comments, and labels). The benchmark is open source and reproducible — see benchmarks/mcp-vs-slop in the repository.

These protocols have different design goals:

  • MCP is designed for tool integration — connecting AI agents to external capabilities (databases, APIs, file systems). It excels at making tools available.
  • SLOP is designed for state observation — giving AI agents structured awareness of application state with contextual actions. It excels at providing context for decision-making.

The benchmark measures how each approach performs when an AI agent needs to understand state and act on it — a use case where SLOP’s design has a natural advantage. Tasks that are purely tool-execution (“call this API with these parameters”) would favor MCP’s simpler model.

Application: An issue tracker with 3 repositories, 15 issues (mix of open/closed), 13 comments, and labels. A larger dataset (10 repos, ~100 issues) is used for scale tests.

Protocols compared:

  • MCP — 13 flat tools (list_repos, get_issue, close_issue, etc.) via stdio transport
  • SLOP — Full state tree with all nodes and contextual affordances
  • SLOP (opt) — Optimized state tree with salience scoring, lazy comments, and summaries, plus navigation tools (slop_query, slop_get_state)
  • SLOP (basic) — Full state tree with a minimal system prompt (no SLOP spec concepts explained)

Model: Gemini 2.5 Flash. Additional runs with Gemini 2.5 Pro and Gemini 3 Flash are noted where results differ significantly.

Each scenario has a verification function that checks the application state after the agent finishes. Results are pass/fail with detailed check breakdowns.

ScenarioMCPSLOPSLOP (opt)SLOP (basic)
explore-and-actPASSPASSPASSPASS
triagePASSPASSPASSPASS
bulk-updatePASSPASSPASSPASS
scale-triage (100 issues)FAILPASSPASSPASS
negative (impossible actions)FAILPASSPASSPASS
contextual (multi-turn)PASSPASSPASSPASS
recovery (fail then act)PASSPASSPASSPASS
state-transitions (close/reopen)PASSPASSPASSPASS
cross-entity (correlate data)PASSPASSPASSPASS
conditional (rule-based)PASSPASSPASSPASS
ambiguity (vague references)FAILPASSPASS*PASS
complex-workflow (sprint planning)FAILPASSPASSPASS
Total8/1212/1211/12*12/12

*SLOP (opt) ambiguity failure was LLM variance (0 tool calls on one run), not a structural issue. Across multiple runs it passes consistently.

scale-triage (FAIL 10/20): With 10 repos, the agent needs list_repos + 10 list_issues calls just for discovery, consuming its action budget before it can act on all bugs. MCP’s discovery overhead scales linearly with the number of entities.

negative (FAIL 4/5): The prompt asks to assign a closed issue. MCP’s flat tool list always shows assign_issue regardless of issue state — the tool doesn’t validate, and the agent doesn’t know the action is inappropriate. SLOP’s contextual affordances don’t expose assign on closed issues, so the agent never attempts it.

complex-workflow (FAIL 8/9): The task requires computing “who has the fewest assignments across ALL repos.” MCP would need to list issues for every repo, track all assignees, compare counts, then act. The agent assigned to the wrong person (charlie instead of alice) because it couldn’t aggregate state across multiple tool-call results.

SLOP provides the full application state upfront. The agent can:

  • Count unassigned bugs across all repos without any tool calls
  • See that a closed issue has reopen but not assign — preventing invalid actions
  • Compare assignee load across the entire tree before making a decision
  • Act on all matching issues in a single LLM turn

Triage (assign unassigned bugs across 3 repos):

MetricMCPSLOP (opt)Delta
Tool calls1612-25%
LLM round trips82-75%
Wall time12,404ms4,605ms-63%
Cost$0.0049$0.00490%

SLOP batches all 12 actions (6 assign + 6 label) in a single LLM turn. MCP needs 8 turns: discovery calls interleaved with actions.

Scale-triage (100 issues across 10 repos):

MetricMCPSLOP (opt)Delta
Tool calls2032+60%
LLM round trips212-90%
Wall time25,633ms19,737ms-23%
Correctness10/2020/20+50%

MCP uses fewer tool calls but needs 21 LLM round trips and still only gets half the bugs. SLOP uses more tool calls (all the assign+label actions) but batches them in 2 turns with 100% correctness.

Complex workflow (sprint planning — aggregate, prioritize, assign, comment, clean up):

MetricMCPSLOP (basic)Delta
Tool calls177-59%
LLM round trips185-72%
Wall time20,800ms11,884ms-43%
Cost$0.0161$0.0121-25%
CorrectnessFAILPASS

The most complex scenario is also SLOP’s strongest showing — cheaper, faster, and correct where MCP failed.

SLOP’s state tree consumes more input tokens than MCP’s system prompt. For simple tasks, this makes SLOP more expensive:

Scenario typeMCP costSLOP (opt) costVerdict
Simple lookup/actionLowerHigherMCP cheaper
Multi-step within one repoSimilarSimilarComparable
Multi-repo reasoningLower per call, but more callsHigher upfront, fewer callsDepends on complexity
Aggregate/cross-entityFails or expensiveFront-loaded but correctSLOP wins on value

The cost calculation changes when you factor in correctness. A failed agent run that costs $0.01 is more expensive than a successful run that costs $0.02 — you have to re-run or manually intervene.

One of SLOP’s most impactful features is that affordances change based on state. This was tested directly:

Negative scenario: “Close an already-closed issue, assign a closed issue, delete a repo.”

  • MCP: assign_issue and close_issue are always in the tool list. The agent called assign_issue on a closed issue — the tool succeeded, corrupting state. No protocol-level guard.
  • SLOP: Closed issues expose reopen and comment but not assign, close, add_label, or remove_label. The agent saw no matching action and correctly refused: “I cannot assign issue-9 because the available actions are comment and reopen.”

This is correctness by design — the protocol prevents structurally invalid actions without relying on the LLM’s judgment.

We compared SLOP with two system prompts:

  • SLOP (full prompt): Explains SLOP concepts — node types, affordances, meta fields, optimized views (windowed collections, lazy children, stub nodes)
  • SLOP (basic prompt): Just “Here is the current state. Use the tools to complete the task.”
MetricSLOP (full prompt)SLOP (basic prompt)
Correctness12/1212/12
Avg costHigher (larger prompt)Lower
Avg timeSimilarOften faster

The SLOP system prompt with spec concepts didn’t improve correctness on any scenario. The tree format with formatTree() — showing node types, properties, affordances, summaries, and windowing indicators — is self-explanatory enough for the model.

The spec prompt becomes more valuable with optimized trees, where the agent needs to understand when to use slop_query to expand truncated data. For the full tree, the basic prompt is sufficient and cheaper.

We ran the benchmark across three Gemini models to test whether model capability changes the protocol advantage:

FindingDetail
Smarter models help MCPGemini 2.5 Pro dropped MCP’s triage round trips from 11 to 8 by batching tool calls better
Smarter models help SLOP moreSLOP naive went from 10/11 to 11/11 with Gemini 2.5 Pro — better at processing the full tree
Very smart models can hurt SLOP (opt)Gemini 3 Flash over-queried with slop_query, burning tokens unnecessarily
MCP’s structural failures persistEven the best model can’t fix scale-triage (discovery budget) or negative (flat tool list)

The key insight: model intelligence narrows the performance gap but doesn’t eliminate SLOP’s structural advantages in correctness and contextual safety.

The spec defines optimization patterns that reduce tree size for large applications:

OptimizationEffectWhen to use
Salience scoringLow-priority nodes compacted first by maxNodesLarge collections with mixed relevance
Lazy childrenChildren declared but not inlined (summary only)Comments, attachments, nested detail
SummariesNatural-language description of truncated contentAll optimized nodes
slop_queryAgent can expand any path on demandWhen agent needs detail beyond the default view

In our benchmark, the optimized tree reduced the small dataset from 22KB to 18KB and the large dataset from 154KB to 81KB while maintaining full correctness.

See Scaling for the full optimization guide, including a discussion of affordance visibility on stub nodes.

Terminal window
cd benchmarks/mcp-vs-slop
bun install
# Scripted mode (no LLM, measures protocol overhead)
bun run run.ts --mode scripted
# Agent mode (requires Gemini API key)
GEMINI_API_KEY=xxx bun run run.ts --mode agent --model gemini-2.5-flash
# Specific scenarios
bun run run.ts --mode agent --scenario triage,negative,complex-workflow
# Specific protocols
bun run run.ts --mode agent --protocol slop,slop-optimized
# Verbose logging (shows every tool call and LLM turn)
bun run run.ts --mode agent --scenario complex-workflow --verbose
# Higher iterations for statistical confidence
bun run run.ts --mode agent --iterations 10
  1. SLOP’s state-first approach eliminates discovery overhead. MCP agents spend significant time and tokens listing, querying, and assembling state before they can act. SLOP front-loads this context.

  2. Contextual affordances prevent invalid actions. This is not just an optimization — it’s a safety feature. MCP cannot prevent an agent from calling assign_issue on a closed issue. SLOP can.

  3. The cost tradeoff is real but nuanced. SLOP uses more input tokens due to the state tree. For simple tasks, this is overhead. For complex tasks requiring reasoning across entities, SLOP’s upfront cost is offset by fewer LLM round trips and higher correctness.

  4. The protocols serve different purposes. MCP is excellent for exposing discrete tools (database queries, API calls, file operations). SLOP is excellent for applications where the AI needs to understand and reason about state before acting. Many real-world systems would benefit from both: MCP for external integrations, SLOP for application state awareness.

  5. Protocol-level optimization works. Salience scoring, lazy children, and summaries reduce tree size without losing correctness — but the agent needs navigation tools (slop_query) to access truncated data when needed.