Engram: Stateful, dual-engine memory for AI Agents.
Standard RAG is a crutch. Dumping tokens into a vector database leads to context-window waste and hallucinations. Engram combines dense vector search (pgvector) with deterministic relationship graphs (Neo4j) to slash LLM costs and give your agents a durable, evolving memory trace.
Engram is the classical term for a memory "trace": the enduring physical and/or chemical changes in neural circuitry produced by learning, which can later be reactivated to support recall. The concept originates with Richard Semon and is widely used in modern neuroscience to describe the substrate of stored experience. [3][4][5][6][7]
It reflects durable storage, precise retrieval, and structured consolidation across sessions—exactly what long‑term agent memory systems aim to provide. [4][5]
- Advanced Memory Management: Implements Engram architecture with ADD/UPDATE/DELETE/NOOP operations
- ACAN Retrieval System: Attention-based Context-Aware Network for intelligent memory retrieval
- Graph Memory (Engram Graph): Neo4j-based entity and relationship storage
- Production-Ready: FastAPI, Docker, monitoring, and comprehensive testing
- Async Processing: Celery-based background task processing
- Vector Search: PostgreSQL with pgvector for efficient similarity search
- Authentication: JWT-based user authentication and authorization
- Monitoring: Prometheus metrics and Grafana dashboards
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ API Gateway │────│ FastAPI Core │────│ Memory Manager │
│ (Nginx) │ │ Service │ │ Service │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │
│ │
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Auth Service │────│ Celery Workers │────│Vector Database │
│ (JWT + Redis) │ │ (Memory Tasks) │ │ (PostgreSQL) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │
│ │
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Graph Database │────│ Embedding API │────│ Monitoring │
│ (Neo4j) │ │ Service │ │(Prometheus/Graf)│
└─────────────────┘ └─────────────────┘ └─────────────────┘
- Python 3.11+ - Latest stable version with excellent async support
- FastAPI - High-performance, auto-documented API framework
- Pydantic v2 - Data validation and serialization
- AsyncIO - For handling concurrent operations
- PostgreSQL 16+ with pgvector 0.7.0 - Vector storage and similarity search
- Neo4j 5.x - Graph database for Engram Graph relationships
- Redis 7.x - Caching, session storage, and Celery message broker
- Ollama (default) - Local LLM inference for privacy-first deployments
- OpenAI API - Cloud LLM option (GPT-4o-mini)
- sentence-transformers - Local embedding generation
- tiktoken - Token counting for cost management
- Docker & Docker Compose - Containerization
- Nginx - Reverse proxy and load balancing
- Celery - Distributed task queue
- Prometheus + Grafana - Metrics collection and visualization
- Docker and Docker Compose
- OpenAI API key
- At least 8GB RAM and 4 CPU cores
git clone <repository-url>
cd engramcp env.example .envEdit .env with your configuration:
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key_here
# Security
SECRET_KEY=your_secret_key_here
# Database passwords (change in production!)
POSTGRES_PASSWORD=secure_password
NEO4J_PASSWORD=secure_password
REDIS_PASSWORD=secure_password# Start all services
docker-compose -f infrastructure/docker/docker-compose.yml up -d
# Check service status
docker-compose -f infrastructure/docker/docker-compose.yml ps# Check API health
curl http://localhost:8000/health/detailed
# Check Flower (Celery monitoring)
open http://localhost:5555
# Check Grafana (metrics)
open http://localhost:3000
# Login: admin/adminOnce running, visit:
- API Docs: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
# Register user
curl -X POST "http://localhost:8000/auth/register" \
-H "Content-Type: application/json" \
-d '{"username": "testuser", "email": "test@example.com", "password": "password123", "full_name": "Test User"}'
# Login
curl -X POST "http://localhost:8000/auth/login" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "username=testuser&password=password123"# Process conversation turn
curl -X POST "http://localhost:8000/memory/process-turn" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"user_message": "I am vegetarian and avoid dairy", "user_id": "user_id", "conversation_id": "conv_id"}'
# Query memories
curl -X POST "http://localhost:8000/memory/query" \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"query": "What are my dietary preferences?", "user_id": "user_id", "top_k": 5}'# Install development dependencies
pip install -r requirements.txt
# Run tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=. --cov-report=html# Install locust
pip install locust
# Run load tests
locust -f tests/load/locustfile.py --host=http://localhost:8000- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (admin/admin)
- Request latency and throughput
- Memory operation success rates
- Database connection health
- Celery task queue status
- Vector search performance
# Basic health check
curl http://localhost:8000/health/
# Detailed health check
curl http://localhost:8000/health/detailed| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
OpenAI API key | Required |
SECRET_KEY |
JWT secret key | Required |
DATABASE_URL |
PostgreSQL connection string | Auto-generated |
REDIS_URL |
Redis connection string | Auto-generated |
NEO4J_URI |
Neo4j connection string | Auto-generated |
SIMILARITY_THRESHOLD |
Memory similarity threshold | 0.75 |
MAX_MEMORIES_PER_USER |
Max memories per user | 10000 |
- Adjust
shared_buffersandwork_memin PostgreSQL - Tune vector index parameters (
listsin ivfflat) - Configure connection pooling
- Adjust worker concurrency based on CPU cores
- Configure task routing and priorities
- Set appropriate timeouts
- Tune similarity thresholds for your use case
- Adjust ACAN attention dimensions
- Configure memory consolidation frequency
-
Security Hardening:
- Change all default passwords
- Use strong JWT secrets
- Enable HTTPS with proper certificates
- Configure firewall rules
-
Scaling:
- Use multiple API replicas
- Scale Celery workers based on load
- Configure database read replicas
- Use Redis Cluster for high availability
-
Monitoring:
- Set up alerting rules in Prometheus
- Configure log aggregation
- Monitor resource usage
- Set up backup procedures
# Apply Kubernetes manifests
kubectl apply -f k8s/
# Check deployment status
kubectl get pods -n engram-production-
Database Connection Errors:
- Check if PostgreSQL is running
- Verify connection strings
- Check network connectivity
-
Memory Operations Failing:
- Verify OpenAI API key
- Check embedding service status
- Review similarity thresholds
-
Celery Tasks Not Processing:
- Check Redis connection
- Verify worker status in Flower
- Review task logs
# View API logs
docker-compose logs -f api
# View worker logs
docker-compose logs -f worker
# View database logs
docker-compose logs -f postgresBased on research benchmarks:
- Response Latency: p95 < 1.5 seconds
- Token Efficiency: 90%+ reduction vs full-context
- Memory Footprint: < 10K tokens per conversation
- Accuracy: >65% LLM-as-a-Judge score
- Throughput: 1000+ requests/minute per instance
Run benchmarks yourself:
cd engram-backend
python -m benchmarks.run_benchmarksSee benchmarks/README.md for detailed benchmark results.
See the examples/ folder for:
- API usage guide with curl commands
- Postman collection for testing
- Python client example
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Engram research and implementation
- OpenAI for GPT models
- The open-source community for excellent tools and libraries
- Documentation: Check the
/docsendpoint when running - Issues: Create GitHub issues for bugs and feature requests
- Discussions: Use GitHub discussions for questions
Engram — enduring offline physical/chemical changes underlying a memory; "engram cells" are the neuron ensembles that encode and can be reactivated to retrieve the memory. [4][5]
Built with ❤️ for the AI community