RETRIEVAL-AUGMENTED GENERATION

Search that actually understands

Hybrid search with sub-300ms latency. Multi-modal support for PDFs, images, videos, and web pages. Context-aware reranking that surfaces what matters.

<300ms latency
#1 on benchmarks
Multi-modal PDFs, images, videos, audio

THE PROBLEM

[1/6]

Traditional RAG is broken

See how Supermemory's approach differs — read the RAG overview →

Lost in chunks

Standard RAG splits documents into chunks that lose context. Isolated fragments produce hallucinated, incomplete answers.

Similarity is not relevance

Vector similarity alone misses nuance. A document about 'Python snakes' matches 'Python programming' — your users get garbage.

Slow at scale

As your knowledge base grows, retrieval slows to a crawl. Users wait seconds for answers that should take milliseconds.

HOW IT WORKS

[2/6]

RAG, reimagined from scratch

Documents Input Smart Chunking Semantic split Memory Generation Atomic facts Hybrid Search Vector + keyword Reranked Results Output
01

Smart Chunking

Semantic decomposition that preserves meaning across boundaries. No more orphaned paragraphs.

02

Memory Generation

Each chunk generates atomic memories — single facts with resolved references. High signal, zero noise.

03

Hybrid Search

Vector similarity + keyword matching + graph traversal. Three retrieval strategies, one API call.

04

Context-Aware Reranking

Results ranked by actual relevance to your query, not just embedding distance.

MULTI-MODAL

[3/6]

Understand any format

Text / Markdown

Plain text and Markdown documents

PDFs

Structured and scanned PDF documents

Web Pages / URLs

Live web crawling and content extraction

Images

OCR and visual understanding

Videos

Transcription and frame analysis

Audio

Speech-to-text with context extraction

Google Docs

Native Google Workspace integration

CSV / JSON

Structured data ingestion and indexing

PERFORMANCE

[4/6]

Benchmarked. Battle-tested.

Metric Supermemory Standard RAG Vector DB Only
Retrieval Quality 85.2% ~60% ~50%
Latency (p95) <300ms 1-3s 500ms
Knowledge Updates Real-time Re-index Re-embed
Temporal Reasoning Native None None
Multi-modal Support Full Text only Text only
Latency <300ms p95 At any scale
Throughput 100B+ tokens/month
LongMemEval 85.2% State-of-the-art

DEVELOPER EXPERIENCE

[5/6]

Three lines to better retrieval

SDK quick start guide →
import Supermemory from 'supermemory'

const client = new Supermemory()

// Add documents
await client.add({
  content: "Your document content or URL",
  containerTags: ["project_docs"]
})

// Search with hybrid retrieval
const results = await client.search.documents({
  q: "How does authentication work?",
  containerTags: ["project_docs"],
  searchMode: "hybrid"
})

GET STARTED

[6/6]

Better retrieval starts here

Available on all plans. Free tier includes 100K tokens.