Overview

The Memory Router is a transparent proxy that sits between your application and your LLM provider, automatically managing context and memories without requiring any code changes.

Live Demo: Try the Memory Router at supermemory.chat to see it in action.

Using Vercel AI SDK? Check out our AI SDK integration for the cleanest implementation with @supermemory/tools/ai-sdk - it’s our recommended approach for new projects.

What is the Memory Router?

The Memory Router gives your LLM applications:

Unlimited Context: No more token limits - conversations can extend indefinitely
Automatic Memory Management: Intelligently chunks, stores, and retrieves relevant context
Zero Code Changes: Works with your existing OpenAI-compatible clients
Cost Optimization: Save up to 70% on token costs through intelligent context management

How It Works

Proxy Request

Your application sends requests to Supermemory instead of directly to your LLM provider

Context Management

Supermemory automatically:

Removes unnecessary context from long conversations
Searches relevant memories from previous interactions
Appends the most relevant context to your prompt

Forward to LLM

The optimized request is forwarded to your chosen LLM provider

Async Memory Creation

New memories are created asynchronously without blocking the response

Key Benefits

For Developers

Drop-in Integration: Just change your base URL - no other code changes needed
Provider Agnostic: Works with OpenAI, Anthropic, Google, Groq, and more
Shared Memory Pool: Memories created via API are available to the Router and vice versa
Automatic Fallback: If Supermemory has issues, requests pass through directly

For Applications

Better Long Conversations: Maintains context even after thousands of messages
Consistent Responses: Memories ensure consistent information across sessions
Smart Retrieval: Only relevant context is included, improving response quality
Cost Savings: Automatic chunking reduces token usage significantly

When to Use the Memory Router

The Memory Router is ideal for:

Perfect For
Consider API Instead

Chat Applications: Customer support, AI assistants, chatbots
Long Conversations: Sessions that exceed model context windows
Multi-Session Memory: Users who return and continue conversations
Quick Prototypes: Get memory capabilities without building infrastructure

Supported Providers

The Memory Router works with any OpenAI-compatible endpoint:

Provider	Base URL	Status
OpenAI	`api.openai.com/v1`	✅ Fully Supported
Anthropic	`api.anthropic.com/v1`	✅ Fully Supported
Google Gemini	`generativelanguage.googleapis.com/v1beta/openai`	✅ Fully Supported
Groq	`api.groq.com/openai/v1`	✅ Fully Supported
DeepInfra	`api.deepinfra.com/v1/openai`	✅ Fully Supported
OpenRouter	`openrouter.ai/api/v1`	✅ Fully Supported
Custom	Any OpenAI-compatible	✅ Supported

Not Yet Supported:

OpenAI Assistants API (/v1/assistants)

Authentication

The Memory Router requires two API keys:

Supermemory API Key: For memory management
Provider API Key: For your chosen LLM provider

You can provide these via:

Headers (recommended for production)
URL parameters (useful for testing)
Request body (for compatibility)

How Memories Work

When using the Memory Router:

Automatic Extraction: Important information from conversations is automatically extracted
Intelligent Chunking: Long messages are split into semantic chunks
Relationship Building: New memories connect to existing knowledge
Smart Retrieval: Only the most relevant memories are included in context

Memories are shared between the Memory Router and Memory API when using the same user_id, allowing you to use both together.

Response Headers

The Memory Router adds diagnostic headers to help you understand what’s happening:

Header	Description
`x-supermemory-conversation-id`	Unique conversation identifier
`x-supermemory-context-modified`	Whether context was modified (`true`/`false`)
`x-supermemory-tokens-processed`	Number of tokens processed
`x-supermemory-chunks-created`	New memory chunks created
`x-supermemory-chunks-retrieved`	Memory chunks added to context

Error Handling

The Memory Router is designed for reliability:

Automatic Fallback: If Supermemory encounters an error, your request passes through unmodified
Error Headers: x-supermemory-error header provides error details
Zero Downtime: Your application continues working even if memory features are unavailable

Rate Limits & Pricing

Rate Limits

No Supermemory-specific rate limits
Subject only to your LLM provider’s limits

Pricing

Free Tier: 100k tokens stored at no cost
Standard Plan: $20/month after free tier
Usage-Based: Each conversation includes 20k free tokens, then $1 per million tokens

Getting Started

Memory API

User Profiles

Memory Router

Integrations with no-code tools

Migration Guides

Deployment

What is the Memory Router?

How It Works

Key Benefits

For Developers

For Applications

When to Use the Memory Router

Supported Providers

Authentication

How Memories Work

Response Headers

Error Handling

Rate Limits & Pricing

Rate Limits

Pricing

Getting Started

Memory API

User Profiles

Memory Router

Integrations with no-code tools

Migration Guides

Deployment

​What is the Memory Router?

​How It Works

​Key Benefits

​For Developers

​For Applications

​When to Use the Memory Router

​Supported Providers

​Authentication

​How Memories Work

​Response Headers

​Error Handling

​Rate Limits & Pricing

​Rate Limits

​Pricing

What is the Memory Router?

How It Works

Key Benefits

For Developers

For Applications

When to Use the Memory Router

Supported Providers

Authentication

How Memories Work

Response Headers

Error Handling

Rate Limits & Pricing

Rate Limits

Pricing