Skip to main content

What is MemScore?

MemScore is a composite metric that captures three dimensions of memory provider performance in a single line:
accuracy% / latencyMs / contextTok
For example:
85% / 120ms / 1500tok
This tells you the provider achieved 85% accuracy, with an average search latency of 120ms, sending 1,500 tokens of context to the answering model per question.

Components

ComponentWhat it measuresSource
QualityAnswer accuracy as a percentage(correct / total) * 100 from judge evaluations
LatencyAverage search response time in millisecondsMean of all search phase durations
TokensAverage context tokens sent to the answering modelClient-side token count of retrieved context per question
MemScore is not a single number — it’s a triple. This is intentional. Collapsing quality, latency, and cost into one score hides important tradeoffs. A provider with 90% accuracy at 5,000 tokens is very different from one with 90% accuracy at 500 tokens.

How token counting works

MemoryBench counts tokens client-side using provider-specific tokenizers:
Model providerTokenizerMethod
OpenAIjs-tiktokenExact count using o200k_base or cl100k_base encoding
Anthropic@anthropic-ai/tokenizerExact count using Anthropic’s tokenizer
GoogleApproximationMath.ceil(text.length / 4)
Three token values are tracked per question:
  • promptTokens — Total tokens in the full prompt (instructions + context + question)
  • basePromptTokens — Tokens in the prompt without any retrieved context
  • contextTokens — Tokens in just the retrieved context string
The MemScore uses contextTokens because it isolates what the memory provider actually contributed.

Where MemScore appears

CLI output

After a benchmark run completes, MemScore is printed in the summary:
SUMMARY:
  Total Questions: 50
  Correct: 43
  Accuracy: 86.00%

  Quality:  86%
  Latency:  145ms (avg)
  Tokens:   1,823 (avg context sent to answering model)

  MemScore: 86% / 145ms / 1823tok

Web UI

The MemScore card appears at the top of the run overview page. Per-question token counts are shown next to each model answer in both the question list and detail views.

Report JSON

The report.json file includes both a display string and structured components:
{
  "memscore": "86% / 145ms / 1823tok",
  "memscoreComponents": {
    "quality": 86,
    "latencyMs": 145,
    "contextTokens": 1823
  },
  "tokens": {
    "totalTokens": 142500,
    "basePromptTokens": 21000,
    "contextTokens": 91150,
    "avgTokensPerQuestion": 2850,
    "avgBasePromptTokens": 420,
    "avgContextTokens": 1823
  }
}
Use memscoreComponents for programmatic comparisons — it avoids parsing the display string.

Comparing providers

MemScore is most useful when comparing providers on the same benchmark:
bun run src/index.ts compare -p supermemory,mem0,zep -b locomo -j gpt-4o
Each provider’s report will include its own MemScore, making it easy to see tradeoffs at a glance:
ProviderMemScore
Provider A88% / 145ms / 1200tok
Provider B82% / 80ms / 2400tok
Provider C85% / 110ms / 1800tok
In this example, Provider A has the highest accuracy but the slowest search. Provider B is the fastest but sends the most context without achieving the best accuracy — suggesting its retrieval may be less precise. Provider C lands in the middle on all three axes. There’s no single “winner” — the right choice depends on whether you prioritize quality, speed, or token efficiency.

Backward compatibility

Runs from before MemScore was added will still work. If token data is not present in the checkpoint, the memscore, memscoreComponents, and tokens fields will be undefined in the report. The CLI and web UI gracefully skip the MemScore display when data is unavailable.