π§ Persistent Memory β Long-Term Memory for AI Agents via MCP
MCP server that gives AI agents long-term memory with semantic search (vector embeddings) and keyword search (BM25). Zero infrastructure β just Python and SQLite.
π― What It Does
Persistent Memory is an MCP (Model Context Protocol) server that gives AI agents long-term memory. Agents can store facts, recall them by meaning (semantic search via vector embeddings), search by keyword (BM25 via SQLite FTS5), and filter by metadata tags β all with zero infrastructure requirements.
Agent: "Remember that the user prefers concise responses"
Server: β Stored with 3 tags
Agent: "What do you know about the user's preferences?"
Server: [semantic match] "User prefers concise responses"
[keyword match] "User likes Python over JavaScript"
π§± Tech Stack
| Component | Technology |
|---|---|
| Framework | fastmcp (MCP server framework) |
| Embeddings | sentence-transformers (all-MiniLM-L6-v2, 384-dim) |
| Storage | SQLite with FTS5 (full-text search) |
| Similarity | NumPy cosine similarity |
Zero external infrastructure β no PostgreSQL, no Pinecone, no Redis, no GPU needed. The entire system runs in-process with a single SQLite database file.
ποΈ Architecture
server.py β FastMCP entry, registers 7 @mcp.tool() functions
memory_store.py β Business logic: orchestrates store/recall/search/forget
embeddings.py β sentence-transformers singleton, cosine_similarity()
storage.py β SQLite CRUD with FTS5 auto-sync via triggers
π οΈ MCP Tools
| Tool | Description |
|---|---|
store | Save a fact with optional tags and source |
recall | Semantic search: query β embed β cosine sim β top-K |
search | Keyword search: BM25 via SQLite FTS5 |
forget | Delete a specific memory by ID |
list | List all memories, optionally filtered by tags |
stats | Return count of memories |
clear | Delete all memories |
πΎ Storage Design
The storage.py module uses a clever SQLite schema:
CREATE TABLE memories (
id TEXT PRIMARY KEY,
content TEXT NOT NULL,
embedding BLOB, -- pickle of numpy array
tags TEXT, -- JSON array
source TEXT,
created_at TEXT,
updated_at TEXT
);
CREATE VIRTUAL TABLE memories_fts USING fts5(
content, tags,
content=memories, content_rowid=rowid,
tokenize='porter unicode61'
);
-- Triggers auto-sync the FTS index on INSERT/UPDATE/DELETE
Key design choices:
- Embeddings stored as pickled numpy arrays in a BLOB column (no vector extension needed)
- FTS5 with
porter unicode61tokenizer for English stemming + Unicode support - Triggers keep the full-text index in sync automatically
- WAL mode for concurrent reads
- Cosine similarity computed in Python β loads all vectors, compares, returns top-K
π Semantic vs Keyword Search
| Feature | Semantic (recall) | Keyword (search) |
|---|---|---|
| How it works | Embed query β cosine similarity with all stored embeddings | BM25 via SQLite FTS5 |
| Good for | Finding by meaning/context | Finding by exact terms |
| Trade-off | O(n) scan over all embeddings | O(log n) via FTS index |
| Model | all-MiniLM-L6-v2 (384-dim) | Porter stemmer (English) |
Both search methods return results ranked by relevance, with the top-K results returned to the agent.
π€ Integration with Any MCP Client
Any MCP-compatible client can connect:
- Claude Desktop: Add to
mcpServersin config - Cursor: Register as an MCP server in settings
- Any MCP host: Connect via stdio transport
π Quick Start
pip install persistent-memory
# Start the MCP server (stdio transport)
python -m persistent_memory server
# Or configure in your MCP client:
{
"mcpServers": {
"persistent-memory": {
"command": "python",
"args": ["-m", "persistent_memory", "server"]
}
}
}
π Resource Usage
| Resource | Estimate |
|---|---|
| RAM | ~500MB (sentence-transformers model loaded once) |
| Disk | ~80MB (model download on first use) |
| CPU | Minimal (inference is ~10ms on modern CPUs) |
| Model | all-MiniLM-L6-v2 (80MB, 384-dim embeddings) |
π‘ Why Itβs Interesting
The biggest limitation of current AI agents is statelessness β every conversation starts from scratch. Persistent Memory solves this with an elegant zero-infrastructure design. It uses SQLite FTS5 for keyword search (not a separate search engine), in-process sentence embeddings for semantic search (no external API), and the MCP protocol for universal compatibility. The entire system fits in a single Python package with no external services, no GPU, and no API keys. Drop it into any MCP client and your agent suddenly has a persistent brain.