An open-source evaluation and optimization system for LLM-powered features, with a focus on retrieval-augmented generation (RAG).
- Track and test various metrics for LLM-powered features
- Support for multiple vector stores (FAISS and pgvector)
- Comprehensive evaluation system
- Real-time monitoring and metrics collection
- Beautiful Streamlit dashboard
- Docker support for easy deployment
- Python 3.9+
- Docker and Docker Compose
- OpenAI API key
- Clone the repository:
git clone https://github.com/yourusername/evalkit.git
cd evalkit- Create a
.envfile:
cp .env.example .env
# Edit .env with your settings- Start the services:
docker-compose up -d- Access the services:
- API: http://localhost:8000
- Dashboard: http://localhost:8501
- Grafana: http://localhost:3000 (admin/admin)
- Prometheus: http://localhost:9090
EvalKit supports two vector store implementations:
- In-memory vector store
- Fast similarity search
- Good for development and testing
- Configure with
VECTOR_STORE_TYPE=faiss
- PostgreSQL-based vector store
- Persistent storage
- Production-ready
- Configure with
VECTOR_STORE_TYPE=pgvector
- Create a virtual environment:
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows- Install dependencies:
pip install -e .- Initialize the database:
alembic upgrade head- Start the development server:
uvicorn evalkit.api.main:app --reloadimport requests
response = requests.post(
"http://localhost:8000/interactions",
json={
"query": "What is the capital of France?",
"response": "The capital of France is Paris.",
"metadata": {
"model": "gpt-4",
"temperature": 0.7
}
}
)response = requests.post(
"http://localhost:8000/evaluations",
json={
"interaction_id": 1,
"criteria": {
"relevance": 0.9,
"coherence": 0.8,
"completeness": 0.7
}
}
)EvalKit includes comprehensive monitoring through Prometheus and Grafana:
- Track interaction counts
- Monitor response times
- Analyze evaluation scores
- Set up alerts for performance issues
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
MIT License - see LICENSE file for details