GitHub - KikeVen/zerikai_memory: A standalone local-only Python MCP server that gives any IDE persistent, workspace-isolated memory. works with any IDE supporting MCP servers

zerikai_memory 🧠

⭐ Bookmark the project: If you use this tool, drop a star to save it to your GitHub profile and track new performance updates.

Never lose your AI context again.
zerikai_memory provides persistent, workspace-isolated memory for every IDE that is local-first, cost-aware, and instant. It uses deterministic Tree-Sitter code parsing indexing to capture entities and deep code descriptions like functions, classes, and docstrings into a local ChromaDB vector store. Accessed via a local MCP interface to slash token costs while maintaining high-resolution codebase mapping, it retrieves hyper-relevant context on query through L2 and Lexical re-indexing with strict source verification (Entity, File, Line Number, and L2). Designed to pair perfectly with low-cost DeepSeek APIs, it injects structured, highly precise local context instead of dumping raw, massive files, maximizing KV cache hits to radically reduce your active token costs.

💡Status: Active & Self-Contained.
This project is used daily and actively maintained by the author. Pull Requests and Issues are closed to keep maintenance overhead low. It is provided fully functional and ready for production use.

The Problem

Every new chat session starts completely cold. When you switch contexts or open a new window:

Your AI Agent forgets every architectural decision, convention, and stack choice made over hours
You waste critical tokens and 10–15 minutes re-explaining the codebase setup in every single chat
Large raw file dumps inflate your token costs and shrink your available context window instantly
Switching IDEs (e.g., VS Code to Cursor) forces you to restart your conversation history from scratch

How Zerikai Memory Solves It

Zerikai Memory runs as a local STDIO MCP server between your IDE and your LLM. It parses your codebase using tree-sitter, indexes code entities into a local ChromaDB vector store, and injects highly relevant context snippets dynamically through natural language.

Your Codebase  →  tree-sitter (local parse)  →  ChromaDB (.brain/)
                                                      │
Your IDE       →  MCP Server (:stdio)        →        ▼
                                             Ollama / DeepSeek
                    │                      (auto-routed synthesis)
          ┌─────────┴──────────────┐
          │   4-Stage Pipeline     │
          │   L1  Vector Search    │  ChromaDB L2 distance matching
          │   L2  Lexical Re-rank  │  Keyword overlap boost on names
          │   L3  Auto-Routing     │  Ollama (free) vs. DeepSeek Cloud
          │   L4  LLM Synthesis    │  Answer + inline #file:line citations
          └────────────────────────┘

Architecture

Cost Model — The Context Tax Mitigation

What gets taxed	Without Zerikai	With Zerikai
🔴 Monthly quota	Re-explaining stack, decisions, and conventions every session	Indexed once. Retrieved as compact snippets per query.
🟡 Context window	Raw file dumps shrink the window available for code generation	1,000–1,200 token brief prefix*. Window stays wide open.
⚪ IDE switching	Full re-explanation required in every new tool	Shared zerikai_memory workspace `.brain/` directory.

Tip: The project brief acts as a stable prefix. After your first query, *DeepSeek caches it, making subsequent repeated queries up to 50× cheaper.

Quick Start

I have added a YouTube video walkthrough of the installation and setup process in the first step below. If you prefer text instructions, just follow along with the code snippets.

1. Install

Watch the installation video

Click the image below to watch a step-by-step walkthrough of the installation and setup process:

git clone https://github.com/your-username/zerikai_memory.git
cd zeriakai_memory

# Create and activate a virtual environment (Python 3.11+)
python -m venv venv
source venv/bin/activate  # Windows: .\venv\Scripts\activate

pip install -r requirements.txt

2. Configure Environment

Remove the .example from the .env.example file in the root directory and rename it to .env:

Expand to view .env

DEEPSEEK_API_KEY=your_deepseek_key_here

# Memory Mode controls which LLM is used for operations:
# - "cloud": Use DeepSeek for all operations (scan, brief, queries) - highest quality, tracked usage
# - "hybrid": Use Ollama for file scanning, DeepSeek for briefs and escalated queries
# - "local": Use Ollama for everything (free, but lower quality briefs)
MEMORY_MODE=cloud

# Enable token tracking and cost reporting (SQLite database at .brain/token_usage.db)
# Set to "false" to disable tracking
ENABLE_TOKEN_TRACKING=true

# Enable deepseek-v4-pro for complex architectural queries (design, architecture, tradeoffs)
# v4-pro is 3x more expensive than v4-flash (currently $0.435/M vs $0.14/M input)
# After May 31 2026, v4-pro will be 6x more expensive ($1.74/M vs $0.14/M)
# Recommended: keep this "false" unless you need maximum reasoning capability
ENABLE_DEEPSEEK_PRO=false

# Semantic search relevance cutoff for query_memory (L2 distance).
# Lower = stricter. Watch "best dist=X.XX" in server.log to calibrate.
# Typical: <0.8 strong match, 0.8-1.5 related, >1.5 noise.
QUERY_DISTANCE_THRESHOLD=1.0

# File extensions to skip during scanning when tree-sitter produces zero
# entities (no functions, classes, headings, semantic HTML elements, etc.).
# Saves API calls on bare config files, trivial templates, empty CSS, etc.
# Format: ['.py', '.html', '.md', '.css']
# Default: [] (empty — no extensions skipped, all fall through to LLM).
SKIP_BARE_FILES=['.py', '.html', '.md', '.css']

# Enable lexical re-ranking in query_memory.
# When true, results passing the distance threshold are reordered by a
# weighted combination of semantic distance and keyword overlap in entity
# name and docstring text. Nothing is dropped — pure reorder.
# Default: false (existing pure-semantic behaviour preserved).
ENABLE_LEXICAL_RERANK=true

# Weight applied per keyword hit during lexical re-ranking.
# The 1/dist spread across the valid-hit band (0.85–0.98) is ~0.156.
# Keep this value below that spread to avoid keyword hits overriding
# a genuinely closer semantic result.
# Recommended starting point: 0.05 (one hit = +0.05, two hits = +0.10).
LEXICAL_RERANK_WEIGHT=0.05

2a. Verify

python -c "from main import scan_workspace, query_memory; print('OK')"

You should see the startup banner followed by OK.

3. IDE Rule Enforcement

To stop your AI agent from ignoring the memory protocol, copy these directives into your IDE's agent rules profile (e.g., .cursorrules or system prompt guidelines):

IDE Rules in: agent_rules/ide_agent_rules.md

Universal-Brain First: The agent must query universal-brain before attempting raw file searches.
Source Discipline: Every answer must surface actual file.py:line citations with zero fabrication.

4. IDE Registration

Press Ctrl+Shift+P → MCP: Add Local Server
Choose STDIO
Set command: C:\path\to\zerikai_memory\venv\Scripts\python.exe C:\path\to\zerikai_memory\main.py

Add to your claude_desktop_config.json profile:

{
  "mcpServers": {
    "universal-brain": {
      "command": "C:\\path\\to\\zerikai_memory\\venv\\Scripts\\python.exe",
      "args": ["C:\\path\\to\\zerikai_memory\\main.py"]
    }
  }
}

5. Setup the `.memignore` file

Works like .gitignore: one pattern per line. scan_workspace reads this file and skips matching paths.

Each project should have its own .memignore in its root directory. Forgetting to configure it before the first scan is the most common reason to use drop_memory.py and start fresh:

Examples of what to ignore: Expand to view

Sample .memignore

# Directories (trailing slash required)
.git/
node_modules/
venv/
__pycache__/
.brain/
dist/
build/

# File/Folder patterns
**/test/
**/tests/
.env
*.log
*.lock
*.pyc

6. Embedding-Docstring Skill

Before running your first index scan, optimize your codebase's docstrings for vector search. Ask your AI Agent:

To install the embedding-docstring globally in your IDE and run it against your codebase to rewrite docstrings into a more embedding-friendly format.
- You can find it in the embedding-docstring skill guide.

"Audit and optimize docstrings across this project using the embedding-docstring skill, respecting .memignore."

Requirement	Why It Matters	Target Impact
Explicit Tech Names	Use `"Uses Redis"` instead of `"key-value store"`	Embeddings match precise tokens, not abstract concepts.
Routing / Branches	Document specific route paths and logical pivot options	Ensures structural code matches are surfaceable.
Guarantees & Effects	Explicitly state code idempotency, atomicity, or mutation side-effects	Prevents agent generation from breaking runtime boundaries.

7. Chat with Memory

Simply instruct your IDE's active AI agent using natural language commands:

prefix queries with "universal-brain: <command>" to ensure they route through the MCP server and leverage your indexed memory:

Scan the workspace for the first time: "Set up memory for this project"
Ask a question: "What are the main architectural components of this project?"

Frequently used follow-ups:

After a code change: "Rescan the workspace and force a refresh of the project brief."
Save part of a chat: "Save the following context to memory: [your custom notes or constraints here]"
Ask how much have you used: "Get me a cost report for my memory usage so far."

See below for a full reference of available commands and their descriptions.

MCP Tools Reference

You never run these commands directly; your active AI agent executes them on your behalf.

Workspace Management

Tool	Description
`init_workspace`	Registers a project folder, assigns a UUID, and creates a pending brief file. Idempotent; safe to run multiple times.
`list_workspaces`	Lists all known workspaces that have a brief or stored memories.
`resolve_workspace`	Resolves a workspace identifier (UUID, short-UUID, or display name) to its filesystem path.
`merge_workspaces`	Consolidates duplicate workspace IDs into one. Irreversible.
`debug_workspace_id`	Diagnostic tool; shows what workspace ID would be generated from a given path.

Memory & Briefs

Tool	Description
`scan_workspace`	Starts a background scan. Returns immediately; use `scan_status` to track progress. Walks the directory, respects `.memignore`, saves all readable text files to persistent memory. Idempotent and self-cleaning. Concurrent (4 workers, batch writes).
`scan_status`	Returns progress of a running or recently completed background scan: files scanned, entities indexed, errors, elapsed time, brief status.
`save_to_memory`	Manually saves an architectural decision, fact, or technical note with an optional category tag.
`list_memory`	Lists stored memories for a workspace, optionally filtered by category.
`query_memory`	Retrieves relevant context via vector search and synthesises an answer via Ollama or DeepSeek (auto-routed). Returns inline `#file:line (distance)` citations — plain text that renders in every IDE, clickable in VS Code Copilot. Defaults to on; set `show_sources=False` for clean output.
`get_brief`	Retrieves the current project brief from `.brain/contexts/`.
`update_brief`	Manually updates the markdown content of a project brief.

Usage & Diagnostics

Tool	Description
`get_token_usage`	Returns DeepSeek API token usage and cost statistics.
`get_cost_report`	Generates a cost breakdown by operation type.
`get_cache_stats`	Shows cache hit/miss rates by operation type.
`purge_usage_data`	Deletes historical token tracking records.

Project Brief Matrix

When a workspace is scanned, Zerikai compiles a dense 1,000–1,200 token project brief across 9 locked components:

Section	What It Captures
1. Overview	Project domain, primary type, and functional scope.
2. Technical Stack	Backend engines, databases, integrations, and core libraries.
3. Core Architecture	Interactivity between frontend, backend, and processing layers.
4. Primary Conventions	Local code styling, custom error handling, and validation schema rules.
5. Purpose	Business logic problems solved and key underlying objectives.
6. Key Files	Definitive app entry points, central routers, and specific domain tasks.
7. Dev & Testing	Environment installation setups, execution triggers, and testing runs.
8. Data Flow	Complete systemic request lifecycle tracing from gateway to database layer.
9. Future Roadmap	Planned engineering steps and dangling `TODO` items parsed directly from code.

Memory Modes

Adjust your operation profile via the MEMORY_MODE environment toggle to balance privacy, speed, and API costs:

Mode	Scan Engine	Query Engine	Total Cost	Ideal Use Case
🟢 `cloud`	DeepSeek	DeepSeek	Low	Recommended. Maximum context nuance, no local setup.
🟡 `hybrid`	Ollama	Ollama + DeepSeek	Lowest	Tight data governance. Free local lookups with cloud escalation.
🔴 `local`	Ollama	Ollama	$0.00	100% air-gapped offline environment tracking.

Configuration Reference

Key	Default	Description
`DEEPSEEK_API_KEY`	Required	Active API authorization key from platform.deepseek.com.
`MEMORY_MODE`	`cloud`	Sets target engines: choices include `cloud`, `hybrid`, or `local`.
`ENABLE_TOKEN_TRACKING`	`true`	Calculates continuous usage and outputs summaries to SQLite.
`QUERY_DISTANCE_THRESHOLD`	`1.0`	Sets L2 vector distance cutoff limits. Lower inputs restrict matches.
`ENABLE_LEXICAL_RERANK`	`false`	Activates secondary hybrid reordering layer via keyword matching.
`SKIP_BARE_FILES`	`[]`	Extension list to bypass when tree-sitter finds zero valid code entities.

Auxiliary Scripts & Troubleshooting

Workspace Reset

If you accidentally execute a workspace crawl before setting up your .memignore configurations, run the auxiliary wipe script to delete stale workspace data:

# Windows
.\venv\Scripts\python.exe drop_memory.py "Workspace Name"

# macOS / Linux
venv/bin/python drop_memory.py "Workspace Name"

Live Log Diagnostics

Monitor server activity, runtime operations, and auto-routing logs inside .brain/server.log:

# Live stream logs (macOS/Linux)
tail -f .brain/server.log

# Live stream logs (Windows PowerShell)
Get-Content .brain\server.log -Wait -Tail 30

Security & Data Privacy

All active vector spaces, tracking registries, and context details reside directly on your local machine.
Add .env and .brain/ explicitly to your global or project .gitignore patterns to prevent API keys and secure indexes from leaking to version control platforms.

To read more about the underlying design principles, architecture decisions, and future roadmap for Zerikai Memory, check out the insight article.

License

🛠️ Support: This project is provided as-is for personal use. To prevent automated spam and AI-generated noise, direct email and issue tracking are disabled. For questions, please use the AI assistant zerikai.com I will get the message and get back to you, don't forget to leave a way to contact you.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.github/workflows		.github/workflows
agent_rules		agent_rules
documentation		documentation
embedding-docstring		embedding-docstring
img		img
.env.example		.env.example
.gitignore		.gitignore
.memignore		.memignore
.memignore.example		.memignore.example
LICENSE		LICENSE
README.md		README.md
code_indexer.py		code_indexer.py
config.py		config.py
drop_memory.py		drop_memory.py
main.py		main.py
requirements.txt		requirements.txt
todo.md		todo.md
zm_logo_70.png		zm_logo_70.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

zerikai_memory 🧠

The Problem

How Zerikai Memory Solves It

Architecture

Cost Model — The Context Tax Mitigation

Quick Start

1. Install

2. Configure Environment

2a. Verify

3. IDE Rule Enforcement

4. IDE Registration

5. Setup the `.memignore` file

6. Embedding-Docstring Skill

7. Chat with Memory

MCP Tools Reference

Workspace Management

Memory & Briefs

Usage & Diagnostics

Project Brief Matrix

Memory Modes

Configuration Reference

Auxiliary Scripts & Troubleshooting

Workspace Reset

Live Log Diagnostics

Security & Data Privacy

License

About

Uh oh!

Releases 11

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

zerikai_memory 🧠

The Problem

How Zerikai Memory Solves It

Architecture

Cost Model — The Context Tax Mitigation

Quick Start

1. Install

2. Configure Environment

2a. Verify

3. IDE Rule Enforcement

4. IDE Registration

5. Setup the .memignore file

6. Embedding-Docstring Skill

7. Chat with Memory

MCP Tools Reference

Workspace Management

Memory & Briefs

Usage & Diagnostics

Project Brief Matrix

Memory Modes

Configuration Reference

Auxiliary Scripts & Troubleshooting

Workspace Reset

Live Log Diagnostics

Security & Data Privacy

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

5. Setup the `.memignore` file

Packages