Skip to main content
The self-hosted server aims for zero configuration — the only thing it needs is one model provider key, which the first-boot wizard collects interactively (or set it via env var for non-interactive deployments). Everything else below is opt-in, layered on top as you need it. The installer writes API keys to ~/.supermemory/env, which is loaded on every launch. You can also set variables in your shell or a process manager.

Core

VariablePurposeDefault
PORT (or SUPERMEMORY_PORT)HTTP listen port6767
SUPERMEMORY_DATA_DIRWhere the graph engine’s data, auth secret, and model cache live./.supermemory

LLM providers

In production, Supermemory uses its own proprietary models tuned for long-horizon data understanding. Self-hosted, you bring your own: embeddings are computed locally, and a model of your choice powers the intelligent steps — summaries, contextual chunking, and memory extraction. Configure at least one:
VariableProvider
OPENAI_API_KEYOpenAI — or any OpenAI-compatible endpoint, see below
ANTHROPIC_API_KEYAnthropic
GEMINI_API_KEYGoogle AI Studio (Gemini)
GROQ_API_KEYGroq
WORKERS_AI_API_KEY + CLOUDFLARE_ACCOUNT_IDCloudflare Workers AI
GOOGLE_VERTEX_PROJECT_ID + GOOGLE_VERTEX_LOCATIONGCP Vertex AI
No key set? The server walks you through it. On first boot, an interactive setup wizard asks which provider you want, securely prompts for the key, and saves it encrypted — including a custom base URL and model name if you pick an OpenAI-compatible endpoint.
With multiple providers configured, the first one in the order above is used.
Image, video, and high-fidelity PDF understanding require a Gemini or Vertex AI key. Text ingestion, memory extraction, and search work with any provider.

Fully offline with local models

OPENAI_API_KEY + OPENAI_BASE_URL covers any OpenAI-compatible endpoint: Ollama, LM Studio, vLLM, llama.cpp server, Together, Fireworks, and more.
# Ollama example — gpt-oss-20b works great
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=ollama        # any non-empty string for local runners
OPENAI_MODEL=gpt-oss:20b
VariablePurposeDefault
OPENAI_BASE_URLOpenAI-compatible endpoint URLOpenAI
OPENAI_MODELModel ID sent to that endpointgpt-5.1
OPENAI_FAST_MODELOverride for fast/light tasksOPENAI_MODEL
OPENAI_TEXT_MODELOverride for heavier text tasksOPENAI_MODEL

File storage

Nothing to configure. Uploaded files (PDFs, images) are stored on local disk inside $SUPERMEMORY_DATA_DIR and served by the server at /files/:key.

Embedding performance

Local embeddings are prewarmed at startup with conservative defaults — one worker, minimal CPU footprint. Turn these up if you’re ingesting heavily and prefer throughput over headroom:
VariablePurposeDefault
SUPERMEMORY_LOCAL_EMBEDDING_POOL_SIZENumber of embedding workers1
SUPERMEMORY_LOCAL_EMBEDDING_WASM_THREADSCompute threads per worker1
SUPERMEMORY_LOCAL_EMBEDDING_BATCH_SIZETexts per worker dispatch8
SUPERMEMORY_LOCAL_EMBEDDING_IDLE_TIMEOUT_MSIdle time before workers shut down120000
SUPERMEMORY_SKIP_EMBEDDING_PREWARMSkip startup prewarm, load on first useunset

Telemetry

The self-hosted binary sends no analytics — there is nothing to opt out of. The only related switch:
VariablePurposeDefault
SUPERMEMORY_DISABLE_TELEMETRYSet to 1 to also disable internal AI SDK telemetry instrumentationunset

Platform-only features

These exist in the codebase but are exclusive to the hosted platform — the self-hosted binary doesn’t include them:
  • Connectors — Google Drive, Notion, Gmail, OneDrive background sync
  • Supermemory MCP — managed MCP server endpoints
  • Optimized memory extraction — the platform’s extraction pipeline is tuned for higher quality at lower cost than bring-your-own-key
  • Managed scale — globally distributed infrastructure, no capacity planning
Any other environment variables you may find referenced in the codebase are platform-only: the self-hosted binary ignores them even when set.

Example: production-ish .env

# Persistent data location
SUPERMEMORY_DATA_DIR=/var/lib/supermemory

# One LLM provider
OPENAI_API_KEY=sk-...
That’s enough for full ingestion, memory extraction, and hybrid search.