Skip to content

AizuddinBadry/AexRAG

Repository files navigation

AexRAG - Self-Hosted Agentic RAG Platform

AexRAG is a production-ready, self-hosted Agentic RAG (Retrieval-Augmented Generation) platform built in Rust. It combines powerful LLMs with vector search and memory systems to create intelligent agents that can reason and retrieve knowledge.

✨ Features

  • πŸ€– Agentic AI: Full ReAct (Reasoning + Acting) loop implementation
  • πŸ“š RAG (Retrieval-Augmented Generation): Vector search with Qdrant for semantic retrieval
  • 🧠 Advanced Memory System:
    • Working memory (recent conversation context)
    • Episodic memory (compressed conversation summaries)
    • Semantic memory (pgvector-based semantic search)
  • 🎯 Multiple Response Formats: Text, Markdown, or structured JSON with schema validation
  • πŸ”Œ Multi-Provider Support: OpenAI, Anthropic, Ollama (local models)
  • πŸ“Š Beautiful Dashboard: Single-page embedded HTML dashboard (no separate frontend build)
  • πŸ“„ Document Management: Upload and manage PDFs, text files, and markdown documents
  • πŸ” Secure: API key authentication, encrypted provider keys
  • 🐳 Easy Deployment: Single Docker image + docker-compose
  • ⚑ High Performance: Built in Rust with async/await throughout

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        AexRAG Core                           β”‚
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚  Agent Loop  β”‚  β”‚   Retrieval  β”‚  β”‚    Memory    β”‚    β”‚
β”‚  β”‚   (ReAct)    β”‚β†’ β”‚   (Qdrant)   β”‚β†’ β”‚  (pgvector)  β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚         ↓                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚     Tools    β”‚  β”‚  LLM Providerβ”‚  β”‚   Formatter  β”‚    β”‚
β”‚  β”‚   Registry   β”‚  β”‚   (Multi)    β”‚  β”‚ (text/json)  β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
           β”‚                    β”‚                    β”‚
           β–Ό                    β–Ό                    β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Postgres  β”‚        β”‚  Qdrant   β”‚       β”‚  OpenAI   β”‚
    β”‚ (pgvector)β”‚        β”‚  Vector   β”‚       β”‚ Anthropic β”‚
    β”‚           β”‚        β”‚    DB     β”‚       β”‚  Ollama   β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Prerequisites

  • Rust 1.83+ - Install from https://rustup.rs
  • Docker & Docker Compose - For PostgreSQL and Qdrant

Setup (< 5 minutes)

cd nexus

# Start dependencies
make setup-local

# Run AexRAG
make dev

AexRAG will:

  1. Auto-run database migrations
  2. Generate an API key (printed to console - save this!)
  3. Start on http://localhost:3000

Access the Dashboard

Open http://localhost:3000 and enter your API key.

Alternative: Full Docker Build

See SETUP_INSTRUCTIONS.md for Docker deployment options.

πŸ“– Usage Guide

Creating an LLM Provider

Before creating agents, you need to configure at least one LLM provider:

curl -X POST http://localhost:3000/api/providers \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "openai",
    "display_name": "OpenAI GPT-4",
    "api_key": "sk-...",
    "base_url": "https://api.openai.com",
    "default_model": "gpt-4"
  }'

Supported Providers:

  • openai - OpenAI API
  • anthropic - Anthropic Claude API
  • ollama - Local Ollama instance (set base_url to http://localhost:11434)

Creating a Knowledge Base

curl -X POST http://localhost:3000/api/knowledge-bases \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Product Documentation",
    "description": "All product docs and guides",
    "embedding_provider_id": "PROVIDER_UUID",
    "embedding_model": "text-embedding-3-small",
    "chunk_size": 512,
    "chunk_overlap": 64
  }'

Uploading Documents

curl -X POST http://localhost:3000/api/knowledge-bases/KB_UUID/documents \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@/path/to/document.pdf"

Supported formats: PDF, TXT, MD (more coming soon)

Document processing happens in the background. Check status via the API.

Creating an Agent

curl -X POST http://localhost:3000/api/agents \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Customer Support Agent",
    "description": "Helps answer customer questions",
    "provider_id": "PROVIDER_UUID",
    "model": "gpt-4",
    "system_prompt": "You are a helpful customer support agent. Use the knowledge base to answer questions accurately.",
    "temperature": 0.7,
    "max_tokens": 2048,
    "response_format": "markdown",
    "max_retrieval_chunks": 5,
    "retrieval_threshold": 0.7,
    "max_tool_iterations": 5
  }'

Linking Knowledge Bases to Agents

# Link a knowledge base to an agent
curl -X POST http://localhost:3000/api/agents/AGENT_UUID/knowledge-bases \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "knowledge_base_id": "KB_UUID"
  }'

Querying an Agent

curl -X POST http://localhost:3000/api/query \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "AGENT_UUID",
    "query": "What is our return policy?",
    "session_id": "user-123-session"
  }'

Response:

{
  "response": "Our return policy allows...",
  "sources": [
    {"filename": "returns.pdf", "score": 0.92}
  ],
  "latency_ms": 1234,
  "tokens": {
    "input_tokens": 450,
    "output_tokens": 120
  }
}

Session Management

Sessions maintain conversation context. Use the same session_id for multi-turn conversations:

# First message
curl -X POST http://localhost:3000/api/query \
  -d '{"agent_id": "...", "query": "Hello", "session_id": "session-1"}'

# Follow-up (remembers context)
curl -X POST http://localhost:3000/api/query \
  -d '{"agent_id": "...", "query": "What did I just ask?", "session_id": "session-1"}'

View session history:

curl http://localhost:3000/api/sessions/session-1/messages \
  -H "Authorization: Bearer YOUR_API_KEY"

πŸ§ͺ Advanced Features

JSON Response Format with Schema Validation

Create an agent that returns structured data:

{
  "name": "Structured Data Agent",
  "response_format": "json",
  "json_schema": {
    "type": "object",
    "properties": {
      "answer": {"type": "string"},
      "confidence": {"type": "number"},
      "sources": {"type": "array", "items": {"type": "string"}}
    },
    "required": ["answer", "confidence"]
  }
}

Memory Compression

After 20 conversation turns, AexRAG automatically:

  1. Summarizes the oldest 10 messages using the LLM
  2. Embeds the summary
  3. Stores it as a semantic memory block
  4. Deletes the old messages

These summaries are retrieved on future queries for long-term context.

πŸ“Š API Reference

Endpoints

Method Endpoint Description
GET /api/health Health check
GET /api/stats Platform statistics
POST /api/query Query an agent
GET /api/query-logs View query logs
GET/POST/PUT/DELETE /api/agents Manage agents
GET/POST/DELETE /api/knowledge-bases Manage knowledge bases
POST /api/knowledge-bases/:id/documents Upload documents
GET/POST /api/providers Manage LLM providers
GET /api/tools List available tools
PUT /api/agents/:id/tools/:name Configure agent tools
GET/DELETE /api/sessions Manage sessions
POST /api/keys Create new API key

All protected endpoints require Authorization: Bearer YOUR_API_KEY header.

πŸ› οΈ Development

Local Development (without Docker)

# Start PostgreSQL with pgvector
docker run -d -p 5432:5432 \
  -e POSTGRES_USER=nexus \
  -e POSTGRES_PASSWORD=nexus \
  -e POSTGRES_DB=nexus \
  pgvector/pgvector:pg16

# Start Qdrant
docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant

# Create .env file
cp .env.example .env

# Run migrations and start AexRAG
cargo run

Running Tests

cargo test

Building for Production

cargo build --release
./target/release/nexus

πŸ”’ Security Considerations

  1. Change NEXUS_SECRET in production - this encrypts stored API keys
  2. Use HTTPS in production (put behind nginx/traefik)
  3. Rotate API keys regularly
  4. Configure tool restrictions (e.g., allowed domains for http_call)
  5. Review query logs for suspicious activity
  6. Backup your database regularly

πŸ“ˆ Performance Tuning

Database

-- Add indexes for common queries
CREATE INDEX idx_query_logs_session_created ON query_logs(session_id, created_at DESC);
CREATE INDEX idx_memory_embedding_ops ON memory_blocks USING ivfflat (embedding vector_cosine_ops);

Qdrant

Configure collection parameters for better performance:

// In create_collection, adjust:
quantization_config: Some(QuantizationConfig::Scalar(...))
hnsw_config: Some(HnswConfig { m: 16, ef_construct: 100, ... })

Connection Pools

Adjust in db.rs:

PgPoolOptions::new()
    .max_connections(100)  // Increase for high load
    .acquire_timeout(Duration::from_secs(5))

πŸ› Troubleshooting

"Connection refused" on startup

Ensure PostgreSQL and Qdrant are running:

docker-compose ps

Migrations fail

Reset database (⚠️ destroys data):

docker-compose down -v
docker-compose up -d

Document indexing stuck on "indexing"

Check logs:

docker-compose logs -f nexus

Ensure embedding provider has valid API key.

High latency

  • Check avg_latency_ms in stats dashboard
  • Review query logs for slow queries
  • Consider using a faster model
  • Reduce max_retrieval_chunks

🀝 Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Write tests for new functionality
  4. Submit a pull request

πŸ“„ License

MIT License - see LICENSE file for details

πŸ™ Acknowledgments

Built with:


Made with ⚑ by the AexRAG team

AexRAG-Enterprise

About

Lightweight Agentic RAG

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages