Skip to main content

Benchmarks

BenchmarkDescriptionSourceCategories
LoCoMoLong context memory testing fact recall across extended conversationssnap-research/locomosingle-hop, multi-hop, temporal, world-knowledge, adversarial
LongMemEvalLong-term memory evaluation across multiple sessions with knowledge updatesxiaowu0162/longmemevalsingle-session-user, single-session-assistant, multi-session, temporal-reasoning, knowledge-update
ConvoMemConversational memory focused on personalization and preference learningSalesforce/ConvoMemuser_evidence, assistant_facts_evidence, preference_evidence, changing_evidence, abstention_evidence
We’re actively adding support for more benchmarks. Contribute your own or create a feature request.

Providers

We’re actively adding support for more providers. Contribute your own or create a feature request.