AexRAG is a production-ready, self-hosted Agentic RAG (Retrieval-Augmented Generation) platform built in Rust. It combines powerful LLMs with vector search and memory systems to create intelligent agents that can reason and retrieve knowledge.
- π€ Agentic AI: Full ReAct (Reasoning + Acting) loop implementation
- π RAG (Retrieval-Augmented Generation): Vector search with Qdrant for semantic retrieval
- π§ Advanced Memory System:
- Working memory (recent conversation context)
- Episodic memory (compressed conversation summaries)
- Semantic memory (pgvector-based semantic search)
- π― Multiple Response Formats: Text, Markdown, or structured JSON with schema validation
- π Multi-Provider Support: OpenAI, Anthropic, Ollama (local models)
- π Beautiful Dashboard: Single-page embedded HTML dashboard (no separate frontend build)
- π Document Management: Upload and manage PDFs, text files, and markdown documents
- π Secure: API key authentication, encrypted provider keys
- π³ Easy Deployment: Single Docker image + docker-compose
- β‘ High Performance: Built in Rust with async/await throughout
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AexRAG Core β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Agent Loop β β Retrieval β β Memory β β
β β (ReAct) ββ β (Qdrant) ββ β (pgvector) β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β Tools β β LLM Providerβ β Formatter β β
β β Registry β β (Multi) β β (text/json) β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββ βββββββββββββ βββββββββββββ
β Postgres β β Qdrant β β OpenAI β
β (pgvector)β β Vector β β Anthropic β
β β β DB β β Ollama β
βββββββββββββ βββββββββββββ βββββββββββββ
- Rust 1.83+ - Install from https://rustup.rs
- Docker & Docker Compose - For PostgreSQL and Qdrant
cd nexus
# Start dependencies
make setup-local
# Run AexRAG
make devAexRAG will:
- Auto-run database migrations
- Generate an API key (printed to console - save this!)
- Start on http://localhost:3000
Open http://localhost:3000 and enter your API key.
See SETUP_INSTRUCTIONS.md for Docker deployment options.
Before creating agents, you need to configure at least one LLM provider:
curl -X POST http://localhost:3000/api/providers \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "openai",
"display_name": "OpenAI GPT-4",
"api_key": "sk-...",
"base_url": "https://api.openai.com",
"default_model": "gpt-4"
}'Supported Providers:
openai- OpenAI APIanthropic- Anthropic Claude APIollama- Local Ollama instance (setbase_urltohttp://localhost:11434)
curl -X POST http://localhost:3000/api/knowledge-bases \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Product Documentation",
"description": "All product docs and guides",
"embedding_provider_id": "PROVIDER_UUID",
"embedding_model": "text-embedding-3-small",
"chunk_size": 512,
"chunk_overlap": 64
}'curl -X POST http://localhost:3000/api/knowledge-bases/KB_UUID/documents \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@/path/to/document.pdf"Supported formats: PDF, TXT, MD (more coming soon)
Document processing happens in the background. Check status via the API.
curl -X POST http://localhost:3000/api/agents \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Customer Support Agent",
"description": "Helps answer customer questions",
"provider_id": "PROVIDER_UUID",
"model": "gpt-4",
"system_prompt": "You are a helpful customer support agent. Use the knowledge base to answer questions accurately.",
"temperature": 0.7,
"max_tokens": 2048,
"response_format": "markdown",
"max_retrieval_chunks": 5,
"retrieval_threshold": 0.7,
"max_tool_iterations": 5
}'# Link a knowledge base to an agent
curl -X POST http://localhost:3000/api/agents/AGENT_UUID/knowledge-bases \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"knowledge_base_id": "KB_UUID"
}'curl -X POST http://localhost:3000/api/query \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"agent_id": "AGENT_UUID",
"query": "What is our return policy?",
"session_id": "user-123-session"
}'Response:
{
"response": "Our return policy allows...",
"sources": [
{"filename": "returns.pdf", "score": 0.92}
],
"latency_ms": 1234,
"tokens": {
"input_tokens": 450,
"output_tokens": 120
}
}Sessions maintain conversation context. Use the same session_id for multi-turn conversations:
# First message
curl -X POST http://localhost:3000/api/query \
-d '{"agent_id": "...", "query": "Hello", "session_id": "session-1"}'
# Follow-up (remembers context)
curl -X POST http://localhost:3000/api/query \
-d '{"agent_id": "...", "query": "What did I just ask?", "session_id": "session-1"}'View session history:
curl http://localhost:3000/api/sessions/session-1/messages \
-H "Authorization: Bearer YOUR_API_KEY"Create an agent that returns structured data:
{
"name": "Structured Data Agent",
"response_format": "json",
"json_schema": {
"type": "object",
"properties": {
"answer": {"type": "string"},
"confidence": {"type": "number"},
"sources": {"type": "array", "items": {"type": "string"}}
},
"required": ["answer", "confidence"]
}
}After 20 conversation turns, AexRAG automatically:
- Summarizes the oldest 10 messages using the LLM
- Embeds the summary
- Stores it as a semantic memory block
- Deletes the old messages
These summaries are retrieved on future queries for long-term context.
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/health |
Health check |
| GET | /api/stats |
Platform statistics |
| POST | /api/query |
Query an agent |
| GET | /api/query-logs |
View query logs |
| GET/POST/PUT/DELETE | /api/agents |
Manage agents |
| GET/POST/DELETE | /api/knowledge-bases |
Manage knowledge bases |
| POST | /api/knowledge-bases/:id/documents |
Upload documents |
| GET/POST | /api/providers |
Manage LLM providers |
| GET | /api/tools |
List available tools |
| PUT | /api/agents/:id/tools/:name |
Configure agent tools |
| GET/DELETE | /api/sessions |
Manage sessions |
| POST | /api/keys |
Create new API key |
All protected endpoints require Authorization: Bearer YOUR_API_KEY header.
# Start PostgreSQL with pgvector
docker run -d -p 5432:5432 \
-e POSTGRES_USER=nexus \
-e POSTGRES_PASSWORD=nexus \
-e POSTGRES_DB=nexus \
pgvector/pgvector:pg16
# Start Qdrant
docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant
# Create .env file
cp .env.example .env
# Run migrations and start AexRAG
cargo runcargo testcargo build --release
./target/release/nexus- Change
NEXUS_SECRETin production - this encrypts stored API keys - Use HTTPS in production (put behind nginx/traefik)
- Rotate API keys regularly
- Configure tool restrictions (e.g., allowed domains for
http_call) - Review query logs for suspicious activity
- Backup your database regularly
-- Add indexes for common queries
CREATE INDEX idx_query_logs_session_created ON query_logs(session_id, created_at DESC);
CREATE INDEX idx_memory_embedding_ops ON memory_blocks USING ivfflat (embedding vector_cosine_ops);Configure collection parameters for better performance:
// In create_collection, adjust:
quantization_config: Some(QuantizationConfig::Scalar(...))
hnsw_config: Some(HnswConfig { m: 16, ef_construct: 100, ... })Adjust in db.rs:
PgPoolOptions::new()
.max_connections(100) // Increase for high load
.acquire_timeout(Duration::from_secs(5))Ensure PostgreSQL and Qdrant are running:
docker-compose psReset database (
docker-compose down -v
docker-compose up -dCheck logs:
docker-compose logs -f nexusEnsure embedding provider has valid API key.
- Check
avg_latency_msin stats dashboard - Review query logs for slow queries
- Consider using a faster model
- Reduce
max_retrieval_chunks
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Write tests for new functionality
- Submit a pull request
MIT License - see LICENSE file for details
Built with:
Made with β‘ by the AexRAG team