RETRIEVAL-AUGMENTED GENERATION

Search that actually understands

Hybrid search with sub-300ms latency. Multi-modal support for PDFs, images, videos, and web pages. Context-aware reranking that surfaces what matters.

Start Building → Read the Research

RAG documentation →

Supermemory RAG - lightning fast retrieval

<300ms latency

#1 on benchmarks

Multi-modal PDFs, images, videos, audio

〉THE PROBLEM [1/6]

Traditional RAG is broken

See how Supermemory's approach differs — read the RAG overview →

Lost in chunks

Standard RAG splits documents into chunks that lose context. Isolated fragments produce hallucinated, incomplete answers.

Similarity is not relevance

Vector similarity alone misses nuance. A document about 'Python snakes' matches 'Python programming' — your users get garbage.

Slow at scale

As your knowledge base grows, retrieval slows to a crawl. Users wait seconds for answers that should take milliseconds.

〉HOW IT WORKS [2/6]

RAG, reimagined from scratch

RAG configuration guide →

Smart Chunking

Semantic decomposition that preserves meaning across boundaries. No more orphaned paragraphs.

Memory Generation

Each chunk generates atomic memories — single facts with resolved references. High signal, zero noise.

Hybrid Search

Vector similarity + keyword matching + graph traversal. Three retrieval strategies, one API call.

Context-Aware Reranking

Results ranked by actual relevance to your query, not just embedding distance.

〉MULTI-MODAL [3/6]

Understand any format

Supported formats documentation →

Text / Markdown

Plain text and Markdown documents

PDFs

Structured and scanned PDF documents

Web Pages / URLs

Live web crawling and content extraction

Images

OCR and visual understanding

Videos

Transcription and frame analysis

Audio

Speech-to-text with context extraction

Google Docs

Native Google Workspace integration

CSV / JSON

Structured data ingestion and indexing

〉PERFORMANCE [4/6]

Benchmarked. Battle-tested.

See benchmarks →

Metric	Supermemory	Standard RAG	Vector DB Only
Retrieval Quality	85.2%	~60%	~50%
Latency (p95)	<300ms	1-3s	500ms
Knowledge Updates	Real-time	Re-index	Re-embed
Temporal Reasoning	Native	None	None
Multi-modal Support	Full	Text only	Text only

Latency <300ms p95 At any scale

Throughput 100B+ tokens/month

LongMemEval 85.2% State-of-the-art

〉DEVELOPER EXPERIENCE [5/6]

Three lines to better retrieval

SDK quick start guide →

import Supermemory from 'supermemory'

const client = new Supermemory()

// Add documents
await client.add({
  content: "Your document content or URL",
  containerTags: ["project_docs"]
})

// Search with hybrid retrieval
const results = await client.search.documents({
  q: "How does authentication work?",
  containerTags: ["project_docs"],
  searchMode: "hybrid"
})

from supermemory import Supermemory

client = Supermemory()

# Add documents
client.add(
  content="Your document content or URL",
  container_tags=["project_docs"]
)

# Search with hybrid retrieval
results = client.search.documents(
  q="How does authentication work?",
  container_tags=["project_docs"],
  search_mode="hybrid"
)

# Add a document
curl https://api.supermemory.ai/v3/add \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"content": "Your document content or URL",
    "containerTags": ["project_docs"]}'

# Search with hybrid retrieval
curl https://api.supermemory.ai/v3/search \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"q": "How does authentication work?",
    "containerTags": ["project_docs"],
    "searchMode": "hybrid"}'

〉GET STARTED [6/6]

Better retrieval starts here

Available on all plans. Free tier includes 100K tokens.

Start Building → Talk to Sales

Quick Start → API Reference → SDK Documentation → Research Paper →