MemBlocks

A Python library for modular, section-aware memory management in LLM applications

Overview

MemBlocks is a Python library that solves the context problem in LLM applications. Instead of dumping entire conversation histories or using flat RAG, MemBlocks organises memory into independent, attachable blocks — each scoped to a distinct knowledge domain (e.g., work, personal, academic) — with section-specific storage, retrieval, and update strategies.

This repository contains:

memblocks_lib/ — The core Python library
backend/ — FastAPI REST API demo wrapping the library
frontend/ — React web UI demo consuming the backend
mcp_server/ — MCP server integration for AI assistants (Claude Desktop, etc.)
evaluation/ — LoCoMo benchmark evaluation harness

The Problem

LLMs lose context over time. Current solutions either:

Send full chat histories — expensive, hits token limits fast
Use flat RAG — treats all memory the same, creates retrieval noise, no domain separation

The Solution

MemBlocks introduces memory blocks — independently attachable memory spaces, each partitioned into three structured sections:

Section	Purpose	Retrieval
Core Memory	Persistent high-priority facts (persona, user attributes)	Always injected in full — no search needed
Semantic Memory	Extracted facts and events with conflict-aware deduplication	Hybrid dense + sparse (SPLADE) search with RRF reranking
Recursive Summary	Progressively compressed recent conversational state	Always injected in full — no search needed

Memory updates are driven by background agentic workflows triggered when the sliding message window reaches its configured limit (default: 10 messages). Three workflows run in parallel: Core Memory extraction, Semantic Memory extraction with conflict resolution (ADD / UPDATE / DELETE / NOOP), and Recursive Summary regeneration.

Quick Start

Prerequisites

Python 3.11+
Docker & Docker Compose
UV package manager

Setup

git clone https://github.com/Ashish-Pandey62/MemBlocks.git
cd MemBlocks

# Copy and fill in your API keys
cp .env.example .env

# Install all packages
uv sync --all-packages

# Start infrastructure (Qdrant on :6333, Ollama on :11434)
docker-compose up -d

Basic Usage

import asyncio
from memblocks import MemBlocksClient, MemBlocksConfig

async def main():
    config = MemBlocksConfig()           # reads from .env
    client = MemBlocksClient(config)

    # --- One-time setup ---
    user    = await client.get_or_create_user("alice")
    block   = await client.create_block(user_id="alice", name="Work Memory")
    session = await client.create_session(user_id="alice", block_id=block.id)

    # --- Per-turn loop ---
    user_msg = "What did we decide about the API design?"

    context  = await block.retrieve(user_msg)          # hybrid retrieval across all sections
    messages = await session.get_memory_window()       # last N conversation turns
    summary  = await session.get_recursive_summary()   # compressed recent context

    # Assemble prompt and call your own LLM
    system   = "You are a helpful assistant.\n\n" + context.to_prompt_string()
    response = my_llm.chat(system, messages + [{"role": "user", "content": user_msg}])

    # Persist the turn — triggers background memory workflows automatically
    await session.add(user_msg=user_msg, ai_response=response)

    await client.close()

asyncio.run(main())

Repository Structure

memblocks_lib/               # Core Python library
├── src/memblocks/
│   ├── client.py            # MemBlocksClient — main API entry point
│   ├── config.py            # MemBlocksConfig (Pydantic settings)
│   ├── services/            # block, session, core_memory, semantic_memory, memory_pipeline
│   ├── storage/             # MongoDBAdapter, QdrantAdapter, EmbeddingProvider
│   ├── llm/                 # Provider integrations: Groq, OpenRouter, Gemini, Ollama
│   ├── models/              # Pydantic schemas (block, memory, retrieval, llm_outputs)
│   └── prompts/             # Agentic workflow prompts (PS1 extraction, PS2 resolution, summary)

backend/                     # FastAPI REST API demo
├── src/api/
│   ├── main.py              # FastAPI app entrypoint
│   ├── routers/             # blocks, chat, memory, users, auth, transparency
│   └── models/requests.py   # Request schemas

mcp_server/                  # MCP server for AI assistant integration
├── src/server.py
└── src/cli.py

frontend/                    # React web UI demo
├── src/components/          # ChatInterface, BlockManager
└── src/api/                 # API client

evaluation/                  # LoCoMo benchmark harness
├── eval.py / locomo_eval.py
├── runners/
├── metrics/
└── datasets/

tests/                       # Integration tests
docker-compose.yml           # Qdrant + Ollama services
pyproject.toml               # UV workspace config

API Reference

Initialization

from memblocks import MemBlocksClient, MemBlocksConfig

config = MemBlocksConfig()        # reads .env; all fields have sensible defaults
client = MemBlocksClient(config)

User & Block Management

user  = await client.get_or_create_user(user_id="alice")
block = await client.create_block(user_id="alice", name="Work", description="Work context")
blocks = await client.get_user_blocks(user_id="alice")
await client.delete_block(block_id=block.id, user_id="alice")

Session & Conversation

session = await client.create_session(user_id="alice", block_id=block.id)

context  = await block.retrieve(user_msg)         # full hybrid retrieval
messages = await session.get_memory_window()      # sliding window of recent turns
summary  = await session.get_recursive_summary()  # compressed context

await session.add(user_msg=user_msg, ai_response=response)  # persist + trigger pipeline
await session.flush()                              # force pipeline run immediately

Isolated Retrieval

core_ctx     = await block.core_retrieve()          # Core Memory only, no vector search
semantic_ctx = await block.semantic_retrieve(query) # Semantic Memory only

Transparency & Observability

client.subscribe("on_memory_extracted", callback)
client.subscribe("on_conflict_resolved", callback)
client.subscribe("on_core_memory_updated", callback)

client.get_retrieval_log()       # per-query retrieval details
client.get_processing_history()  # pipeline run history
client.get_llm_usage()           # token usage + latency per task type

Configuration

Key environment variables (see .env.example for the full list):

# LLM provider (groq | openrouter | gemini | ollama)
LLM_PROVIDER_NAME=groq
GROQ_API_KEY=your_groq_key
OPENROUTER_API_KEY=your_openrouter_key
GEMINI_API_KEY=your_gemini_key

# Vector database
QDRANT_HOST=localhost
QDRANT_PORT=6333

# Embeddings (local)
OLLAMA_BASE_URL=http://localhost:11434

# Reranking
COHERE_API_KEY=your_cohere_key

# Memory pipeline behaviour
MEMORY_WINDOW=10      # messages before pipeline triggers
KEEP_LAST_N=4         # messages to retain after pipeline flush

# MongoDB
MONGO_URI=mongodb://localhost:27017

Per-task LLM configuration (optional):

from memblocks.llm.task_settings import LLMSettings, LLMTaskSettings

config = MemBlocksConfig(
    llm_settings=LLMSettings(
        default=LLMTaskSettings(provider="openrouter", model="meta-llama/llama-4-maverick"),
        retrieval=LLMTaskSettings(provider="groq", model="llama-3.1-8b-instant"),
    )
)

Running the Demo Applications

Backend API:

uv run python -m uvicorn backend.src.api.main:app --reload

Frontend:

cd frontend && npm install && npm run dev

MCP Server (configure in your AI assistant):

{
  "mcp": {
    "memblocks": {
      "type": "local",
      "command": ["uv", "run", "python", "-m", "mcp_server.server"],
      "environment": {"MEMBLOCKS_USER_ID": "your_user_id"}
    }
  }
}

Evaluation:

uv run python evaluation/locomo_eval.py

Evaluation

Evaluated on the LoCoMo long-conversation benchmark:

Metric	MemBlocks	Mem0	Full-context
Memory tokens	1027	1764	26031
Search latency p50	1.503s	0.148s	—
Total latency p50	2.701s	0.708s	9.870s

MemBlocks results are from our own evaluation on a 10-conversation subset of LoCoMo using the free-tier gpt-oss-20b model. Mem0 and other baseline figures are taken from the Mem0 paper (evaluated on the full 50-conversation dataset using gpt-4o-mini). Latency figures are not directly comparable due to model differences. Token consumption figures are model-independent.

Documentation

Document	Description
Setup Guide	Installation, prerequisites, configuration, LLM providers, troubleshooting
API Reference	Complete methods and interfaces with usage examples
Technical Overview	Architecture, memory pipeline internals, data flow, DB schemas
Contributing	Dev environment setup, running tests, code style, PR process

Running Tests

# Unit tests (no infrastructure needed)
uv run --package memblocks pytest tests/ -v

# Integration tests (requires live MongoDB + Qdrant)
uv run --package memblocks pytest -m integration -v

Authors

Built by Ashish Pandey, Ankit Mehta, and Aayush Ojha.

Project Repository

https://github.com/Ashish-Pandey62/MemBlocks

Name		Name	Last commit message	Last commit date
Latest commit History 217 Commits
.github/workflows		.github/workflows
backend		backend
docs		docs
evaluation		evaluation
frontend		frontend
mcp_server		mcp_server
memblocks_lib		memblocks_lib
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.ollama		Dockerfile.ollama
LICENSE		LICENSE
PYPI_READINESS.md		PYPI_READINESS.md
README.md		README.md
docker-compose.yml		docker-compose.yml
opencode.json		opencode.json
pyproject.toml		pyproject.toml
skills-lock.json		skills-lock.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MemBlocks

Overview

The Problem

The Solution

Quick Start

Prerequisites

Setup

Basic Usage

Repository Structure

API Reference

Initialization

User & Block Management

Session & Conversation

Isolated Retrieval

Transparency & Observability

Configuration

Running the Demo Applications

Evaluation

Documentation

Running Tests

Authors

Project Repository

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MemBlocks

Overview

The Problem

The Solution

Quick Start

Prerequisites

Setup

Basic Usage

Repository Structure

API Reference

Initialization

User & Block Management

Session & Conversation

Isolated Retrieval

Transparency & Observability

Configuration

Running the Demo Applications

Evaluation

Documentation

Running Tests

Authors

Project Repository

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages