EvalKit

An open-source evaluation and optimization system for LLM-powered features, with a focus on retrieval-augmented generation (RAG).

Features

Track and test various metrics for LLM-powered features
Support for multiple vector stores (FAISS and pgvector)
Comprehensive evaluation system
Real-time monitoring and metrics collection
Beautiful Streamlit dashboard
Docker support for easy deployment

Prerequisites

Python 3.9+
Docker and Docker Compose
OpenAI API key

Quick Start

Clone the repository:

git clone https://github.com/yourusername/evalkit.git
cd evalkit

Create a .env file:

cp .env.example .env
# Edit .env with your settings

Start the services:

docker-compose up -d

Access the services:

API: http://localhost:8000
Dashboard: http://localhost:8501
Grafana: http://localhost:3000 (admin/admin)
Prometheus: http://localhost:9090

Vector Store Options

EvalKit supports two vector store implementations:

FAISS

In-memory vector store
Fast similarity search
Good for development and testing
Configure with VECTOR_STORE_TYPE=faiss

pgvector

PostgreSQL-based vector store
Persistent storage
Production-ready
Configure with VECTOR_STORE_TYPE=pgvector

Development

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows

Install dependencies:

pip install -e .

Initialize the database:

alembic upgrade head

Start the development server:

uvicorn evalkit.api.main:app --reload

API Usage

Record an Interaction

import requests

response = requests.post(
    "http://localhost:8000/interactions",
    json={
        "query": "What is the capital of France?",
        "response": "The capital of France is Paris.",
        "metadata": {
            "model": "gpt-4",
            "temperature": 0.7
        }
    }
)

Run an Evaluation

response = requests.post(
    "http://localhost:8000/evaluations",
    json={
        "interaction_id": 1,
        "criteria": {
            "relevance": 0.9,
            "coherence": 0.8,
            "completeness": 0.7
        }
    }
)

Monitoring

EvalKit includes comprehensive monitoring through Prometheus and Grafana:

Track interaction counts
Monitor response times
Analyze evaluation scores
Set up alerts for performance issues

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

MIT License - see LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
alembic		alembic
docs		docs
evalkit		evalkit
Dockerfile		Dockerfile
README.md		README.md
alembic.ini		alembic.ini
docker-compose.yml		docker-compose.yml
prometheus.yml		prometheus.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EvalKit

Features

Prerequisites

Quick Start

Vector Store Options

FAISS

pgvector

Development

API Usage

Record an Interaction

Run an Evaluation

Monitoring

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EvalKit

Features

Prerequisites

Quick Start

Vector Store Options

FAISS

pgvector

Development

API Usage

Record an Interaction

Run an Evaluation

Monitoring

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages