Cache Architecture

NEKTE’s cache architecture draws from 40 years of CPU cache research, applying only the techniques that make sense at the scale of agent-to-agent coordination (50-10,000 capabilities, under 2MB memory).

Architecture Overview

CapabilityCache (Application Layer)
  |
  +-- EvictionPolicy (Domain, pure logic)
  |     +-- SievePolicy (scan-resistant)
  |     +-- GDSFPolicy (token-cost-weighted)
  |     +-- SieveGDSFPolicy (composed: best of both)
  |
  +-- RevalidationStrategy (Port)
  |     +-- BackgroundRevalidator (Adapter)
  |
  +-- InMemoryCacheStore (Adapter, implements CacheStore port)
  |
  +-- RequestCoalescer (Infrastructure)
  |
  +-- NegativeCache (Application)

SIEVE Eviction Policy

SIEVE (NSDI 2024) is a modern replacement for LRU that provides scan resistance with minimal complexity — roughly 30 lines of code.

Why Not LRU?

LRU is not scan-resistant. When a catalog() call brings in 200 L0 entries at once, those entries displace the entire hot set. The frequently-accessed L2 schemas that cost 120 tokens each to re-fetch get evicted in favor of catalog entries that may never be accessed again.

How SIEVE Works

SIEVE uses a FIFO queue with a “hand” pointer and a single visited bit per entry:

Insert: add to HEAD of FIFO, visited = false
Access: set visited = true (no reordering)
Evict:
  while hand.visited:
    hand.visited = false    // give second chance
    hand = hand.next
  evict hand                // not re-accessed, remove it
  hand = hand.next

Result: 21% better hit rate than FIFO, comparable to ARC, but with far less implementation complexity. A catalog() scan of 200 entries does not contaminate the cache because they enter with visited=false and get evicted immediately if never re-accessed.

GDSF: Token-Cost-Weighted Eviction

GreedyDual-Size-Frequency (GDSF) is NEKTE’s differentiator. No other protocol (MCP, A2A) has eviction that is aware of token cost.

The Problem

Without cost awareness, the cache treats L0 entries (8 tokens to re-fetch), L1 entries (40 tokens), and L2 entries (120 tokens) with equal priority. Evicting an L2 schema that cost 120 tokens to discover is 15x more expensive than evicting an L0 catalog entry.

How GDSF Works

Each entry gets a priority score:

priority(entry) = access_count * token_cost_to_refetch

Eviction removes the entry with the lowest priority.

Example

Entry	Level	Accesses	Priority
Entry A	L0 catalog	1	1 x 8 = 8
Entry B	L2 schema	3	3 x 120 = 360
Entry C	L1 summary	10	10 x 40 = 400

Eviction order: A (8) then B (360) then C (400).

Without GDSF, a plain LRU cache might evict B (the expensive L2 schema) simply because it was accessed less recently than A, even though re-fetching B costs 15x more tokens.

Composed: SIEVE + GDSF

The default eviction policy composes both: SIEVE provides scan resistance, GDSF provides cost-awareness. SIEVE selects eviction candidates, then GDSF picks the cheapest one to remove.

Stale-While-Revalidate

The Problem

With a fixed TTL of 5 minutes, every TTL expiration causes a blocking nekte.discover call. The agent waits for a network round-trip before it can use the capability. This creates periodic latency spikes.

The Solution

Serve the stale entry immediately and refresh in the background:

get(key):
  entry = store.get(key)
  if fresh (within TTL):
    return entry                      // normal hit
  if stale (past TTL, within grace):
    triggerBackgroundRefresh(key)      // fire-and-forget
    return entry                      // serve stale, 0 latency
  if expired (past grace period):
    delete entry
    return undefined                  // force re-discover

Reality check: 99% of capability schemas do NOT change in 5 minutes. Serving a stale entry is correct almost every time, and the background refresh ensures the cache converges to the latest version within seconds.

TTL Jitter

To prevent cache stampedes (all entries expiring simultaneously after a catalog() call), NEKTE adds +/-10% jitter to every TTL:

ttlMs = baseTtl * (0.9 + Math.random() * 0.2)

This spreads out expiration times so that refreshes happen gradually rather than all at once.

Negative Caching

The Problem

When an agent does NOT have a capability, discover() returns an empty list. Without negative caching, every subsequent attempt to invoke that nonexistent capability triggers another discovery round-trip.

The Solution

Cache negative results with a short TTL (60 seconds):

discover({ filter: { id: "nonexistent" } })
  -> result.caps = []
  -> cache.setNegative("agentX:nonexistent", ttl=60s)

Next attempt (within 60s):
  -> cache.isNegative("agentX:nonexistent") === true
  -> skip discover, return empty directly

Negatives are automatically cleared when the capability appears (e.g., on a VERSION_MISMATCH error response that includes the schema).

Request Coalescing

The Problem

If 5 clients share a cache and the entry for “sentiment” expires, all 5 send concurrent discover() requests. Four of those are redundant.

The Solution

The RequestCoalescer deduplicates in-flight requests:

client1.discover("sentiment")  -> cache miss -> START refresh
client2.discover("sentiment")  -> cache miss -> WAIT for client1
client3.discover("sentiment")  -> cache miss -> WAIT for client1

refresh completes -> resolve for all three

Implementation: a Map<string, Promise<T>> of in-flight requests. If a key already has a promise, return it instead of starting a new request.

What We Chose Not to Implement

Several CPU cache techniques were evaluated and deliberately excluded:

Technique	Why Not
ARC (Adaptive Replacement Cache)	Maintains 4 lists. SIEVE achieves 90% of the benefit with 30 LOC
Resolution TLB	`Map.get()` is already O(1) at ~50ns. A cache in front of a cache adds no value
Bloom Filters	With under 10K capabilities, `Map<string, Set<string>>` is O(1), exact, and uses trivial memory
Markov Prefetching	Cold-start problem is devastating. LLM agent behavior is non-stationary. Misprediction penalty is high

Metrics

Metric	Before	After
Eviction scan	O(n)	O(1)
Cache hit rate	~60% (FIFO)	85%+ (SIEVE+GDSF)
P99 latency spikes	Blocking on TTL expiry	0 (stale-while-revalidate)
Stampede on expiry	N concurrent discovers	1 (coalesced)
Wasted discovers	Unbounded	0 after first miss (negative cache)