
Unlimited Context
No more token limits - conversations can extend indefinitely
Zero Latency
Transparent proxying with negligible overhead
Cost Efficient
Save up to 70% on token costs for long conversations
Provider Agnostic
Works with any OpenAI-compatible endpoint
Getting Started
To use the Infinite Chat endpoint, you need to:1. Get a supermemory API key
Head to supermemory’s Developer Platform built to help you monitor and manage every aspect of the API.Getting an API Key
Getting an API Key
Create an account
An account will automatically be created on your first login.

Create an API Key
1
Navigate to API Keys

Create API Key
2
Choose Name and Expiry (Optional)

Create
3
Copy New Key

2. Add supermemory in front of any OpenAI-Compatible API URL
How It Works
1
Transparent Proxying
All requests pass through supermemory to your chosen LLM provider with zero latency overhead.

2
Intelligent Chunking
Long conversations are automatically broken down into optimized segments using our proprietary chunking algorithm that preserves semantic coherence.
3
Smart Retrieval
When conversations exceed token limits (20k+), supermemory intelligently retrieves the most relevant context from previous messages.
4
Automatic Token Management
The system intelligently balances token usage, ensuring optimal performance while minimizing costs.
Performance Benefits
Reduced Token Usage
Reduced Token Usage
Save up to 70% on token costs for long conversations through intelligent context management and caching.
Unlimited Context
Unlimited Context
No more 8k/32k/128k token limits - conversations can extend indefinitely with supermemory’s advanced retrieval system.
Improved Response Quality
Improved Response Quality
Better context retrieval means more coherent responses even in very long threads, reducing hallucinations and inconsistencies.
Zero Performance Penalty
Zero Performance Penalty
The proxy adds negligible latency to your requests, ensuring fast response times for your users.
Pricing
Error Handling
supermemory is designed with reliability as the top priority. If any issues occur within the supermemory processing pipeline, the system will automatically fall back to direct forwarding of your request to the LLM provider, ensuring zero downtime for your applications.
Header | Description |
---|---|
x-supermemory-conversation-id | Unique identifier for the conversation thread |
x-supermemory-context-modified | Indicates whether supermemory modified the context (“true” or “false”) |
x-supermemory-tokens-processed | Number of tokens processed in this request |
x-supermemory-chunks-created | Number of new chunks created from this conversation |
x-supermemory-chunks-deleted | Number of chunks removed (if any) |
x-supermemory-docs-deleted | Number of documents removed (if any) |
x-supermemory-error
will be included with details about what went wrong. Your request will still be processed by the underlying LLM provider even if supermemory encounters an error.
Rate Limiting
Currently, there are no rate limits specific to supermemory. Your requests are subject only to the rate limits of your underlying LLM provider.
Supported Models
supermemory works with any OpenAI-compatible API, including:OpenAI
GPT-3.5, GPT-4, GPT-4o
Anthropic
Claude 3 models
Other Providers
Any provider with an OpenAI-compatible endpoint