Benchmarks
Token Savings by Scenario
These benchmarks measure total tokens consumed across a full conversation, including discovery, invocation, and response overhead.
| Scenario | MCP native | mcp2cli | NEKTE | vs MCP |
|---|---|---|---|---|
| 5 tools x 5 turns | 3,025 | 655 | 345 | -89% |
| 15 tools x 10 turns | 18,150 | 1,390 | 730 | -96% |
| 30 tools x 15 turns | 54,450 | 2,205 | 1,155 | -98% |
| 50 tools x 20 turns | 121,000 | 3,100 | 1,620 | -99% |
| 100 tools x 25 turns | 302,500 | 4,475 | 2,325 | -99% |
| 200 tools x 30 turns | 726,000 | 6,650 | 3,430 | ~100% |
Why the Difference?
MCP serializes all tool schemas into every conversation turn. With 30 tools at ~121 tokens each, that is 3,630 tokens per turn just for definitions — regardless of whether any tool is used.
mcp2cli converts MCP servers to CLI with on-demand discovery, achieving 96-99% savings. However, it is a hack on top of MCP, not a formal protocol.
NEKTE achieves savings through three mechanisms:
- Progressive discovery (L0/L1/L2): Only fetch what you need. L0 catalog costs ~8 tokens per capability instead of ~121.
- Zero-schema invocation: After the first discovery, the version hash lets you invoke without re-sending schemas. Cost: 0 extra tokens.
- Semantic result compression: Results at
minimaldetail level use ~4 tokens instead of ~200 for full responses.
Per-Operation Token Costs
| Operation | MCP | NEKTE | Savings |
|---|---|---|---|
| Discovery (per tool) | ~121 tokens/turn | ~8 tokens (once, L0) | -93% |
| Invocation overhead | ~121 tokens + payload | 0 tokens (cached hash) | -100% |
| Response (minimal) | Full response always | ~4 tokens | Variable |
| Response (compact) | Full response always | ~12 tokens | Variable |
Discovery Level Costs
| Level | What You Get | Cost per Capability |
|---|---|---|
| L0 — Catalog | Name + category + version hash | ~8 tokens |
| L1 — Summary | Description + input/output types | ~40 tokens |
| L2 — Full Schema | Typed JSON Schema + examples | ~120 tokens |
Enterprise Cost Analysis
Scenario: 50 tools, 20 turns per conversation, 1,000 conversations per day.
| Protocol | Tokens/Day | Monthly Cost* | Monthly Savings |
|---|---|---|---|
| MCP native | 121,000,000 | $10,890 | — |
| mcp2cli | 3,100,000 | $279 | $10,611 |
| NEKTE | 1,620,000 | $146 | $10,744 |
Based on GPT-4 class pricing at $3/1M input tokens + $9/1M output tokens.
Scaling Analysis
The gap widens with scale:
| Scale | MCP Monthly | NEKTE Monthly | Savings |
|---|---|---|---|
| Startup (10 tools, 100 conv/day) | $109 | $3 | $106 |
| Growth (50 tools, 1K conv/day) | $10,890 | $146 | $10,744 |
| Enterprise (200 tools, 10K conv/day) | $653,400 | $3,087 | $650,313 |
At enterprise scale with 200 tools and 10,000 conversations per day, the savings exceed $650,000 per month.
Context Window Impact
Token savings are not just about cost. They directly affect model performance by freeing up the context window for actual reasoning.
| Tools | MCP Context Consumed | NEKTE Context Consumed |
|---|---|---|
| 5 | 6% of 128K window | 0.3% |
| 30 | 28% of 128K window | 0.9% |
| 100 | 72% of 128K window | 1.8% |
| 200 | >100% (overflows) | 2.7% |
Wire Format: JSON vs MessagePack
NEKTE supports optional MessagePack encoding for further wire-level savings:
| Payload | JSON (bytes) | MessagePack (bytes) | Savings |
|---|---|---|---|
| L0 discovery (3 caps) | 245 | 168 | -31% |
| Invoke request | 189 | 132 | -30% |
| Invoke response (compact) | 156 | 108 | -31% |
| Delegate stream event | 134 | 92 | -31% |
MessagePack provides a consistent ~30% reduction in wire size. Combined with gRPC + Protobuf, total wire overhead drops further.
Running the Benchmarks
# Clone and buildgit clone https://github.com/nekte-protocol/nekte.gitcd nekte && pnpm install && pnpm build
# Run token comparison benchmarkspnpm benchmark
# Run JSON vs MessagePack size comparisonnekte bench http://localhost:4001The benchmark suite generates synthetic workloads at various tool counts and turn counts, measuring total tokens consumed across the full conversation lifecycle.