RagTrack

RagTrack is a lightweight Git-like version control layer for RAG knowledge bases.

It helps you:

track document changes across ingestions
avoid stale chunks in retrieval pipelines
search only the latest knowledge
inspect what changed between document versions
keep source traces for retrieved chunks

RagTrack is CLI-first and designed for small VPS deployments. It uses local files, SQLite, SQLAlchemy, keyword search, and a small server-rendered FastAPI UI by default. Optional semantic search uses FastEmbed, an ONNX Runtime based embedding provider that avoids PyTorch and CUDA dependencies. RagTrack does not use Redis, Celery, MinIO, Kubernetes, authentication, external cloud services, or a JavaScript build pipeline.

What Problem It Solves

RAG apps often treat ingestion as a one-way import. Over time, source documents change, chunks become stale, and it becomes hard to answer simple questions:

Which files changed?
Which chunks belong to the latest document version?
What did the last ingestion add or remove?
Can I search current knowledge without keeping old chunks active?

RagTrack adds a small version-control layer around local document ingestion so RAG knowledge bases are easier to inspect, refresh, and trust.

Features

recursive ingestion for .pdf, .docx, .txt, .md, and .markdown
SHA256 file hashing for change detection
document versions stored in SQLite
chunk hashes and chunk metadata stored in SQLite
keyword search with no embedding package required
optional FastEmbed semantic embeddings using BAAI/bge-small-en-v1.5
optional SentenceTransformers provider for users who explicitly want it
semantic search limited to latest document versions when embedding support is installed
diff summaries between latest and previous versions
status summaries for project health
local FastAPI/Jinja web interface for upload and inspection

Installation

Create and activate a virtual environment:

python -m venv .venv
.\.venv\Scripts\Activate.ps1

Install RagTrack:

pip install -e .

Install FastEmbed semantic search support only when you want embeddings:

pip install -e ".[embed]"

Install the legacy SentenceTransformers provider only if you explicitly want it:

pip install -e ".[st]"

For development:

pip install -e ".[dev]"

For development with semantic search:

pip install -e ".[dev,embed]"

Local Usage

Create a docs directory:

mkdir docs

Ingest documents:

ragtrack ingest .\docs

Search latest chunks with the default keyword mode:

ragtrack search "wifi password"

Semantic search requires embedding support:

pip install -e ".[embed]"
export SEARCH_MODE=semantic

Show changes between latest and previous versions:

ragtrack diff

Show changes for one document:

ragtrack diff .\docs\house_rules.pdf

Show project status:

ragtrack status

Web Interface

RagTrack includes a lightweight server-rendered web interface built with FastAPI, Jinja templates, Tailwind CDN, and HTMX. It reads from the same SQLite database and search index as the CLI.

Run the local UI:

ragtrack serve

Then open:

http://127.0.0.1:8000

Bind to a VPS interface:

ragtrack serve --host 0.0.0.0 --port 8000

Pages included:

Dashboard: project counts, recent activity, tracked documents
Documents: upload supported files, document list, and document detail pages
Versions: version history
Search: latest-version chunk search
Diff: latest vs previous chunk-hash summaries
Audit: lightweight integrity overview
Settings: local runtime paths and model configuration

Screenshots can be added under docs/screenshots/ as the interface stabilizes.

Uploaded files are saved to ./docs and ingested through the same versioning, parsing, chunking, and optional search-index pipeline as the CLI.

CLI Examples

ragtrack ingest ./docs
ragtrack search "check in time"
ragtrack diff
ragtrack diff ./docs/guest_manual.docx
ragtrack status
ragtrack serve

Example diff output:

house_rules.pdf

v1 -> v2

+ Added: 3 chunks
- Removed: 1 chunk
~ Modified: 5 chunks

Docker Usage

Build the image:

docker compose build

Run commands with your local ./docs directory mounted at /docs and local ./.ragtrack mounted for persistence:

docker compose up
docker compose run --rm ragtrack ingest /docs
docker compose run --rm ragtrack search "wifi password"
docker compose run --rm ragtrack diff
docker compose run --rm ragtrack status

The Docker setup serves the UI on http://localhost:8000 and stores SQLite and local RagTrack data in ./.ragtrack on the host.

Configuration

RagTrack reads environment variables with the RAGTRACK_ prefix:

RAGTRACK_PROJECT_ROOT
RAGTRACK_DATA_DIR
RAGTRACK_DATABASE_URL
RAGTRACK_STORAGE_DIR
RAGTRACK_VECTOR_INDEX_PATH
RAGTRACK_EMBEDDING_PROVIDER
RAGTRACK_EMBEDDING_MODEL
RAGTRACK_SEARCH_MODE
RAGTRACK_LOG_LEVEL

Defaults:

database: .ragtrack/ragtrack.db
search mode: keyword
embedding provider: fastembed
embedding model: BAAI/bge-small-en-v1.5
semantic vector index: .ragtrack/faiss.json
vector metadata: .ragtrack/faiss_meta.json

Short environment variable aliases are also supported:

EMBEDDING_PROVIDER
SEARCH_MODE

Supported embedding providers:

fastembed
sentence-transformers

Supported search modes:

keyword
semantic

Why FastEmbed

FastEmbed is the recommended semantic embedding provider for RagTrack because it is CPU-first, ONNX Runtime based, and avoids the PyTorch, Transformers, CUDA, and NVIDIA dependency chain that makes installs heavy on small VPS machines. This keeps the default RagTrack experience lightweight while still allowing semantic retrieval when users opt in.

V1 Limitations

diff uses chunk hashes and estimates modified chunks as min(added, removed)
semantic search is optional and requires pip install -e ".[embed]"
SentenceTransformers remains available through pip install -e ".[st]"
when semantic search is enabled, one local vector index is rebuilt after CLI/UI ingestion
only latest document versions are searchable
no background workers
web UI upload is synchronous in V1
no authentication
no remote object storage
no hosted vector database

Roadmap

richer diff output with chunk previews
configurable chunking strategies
parser metadata improvements
incremental vector index updates
optional Qdrant or pgvector backend
export/import project snapshots
document restore and pinning
better source citation formatting

Development

Run tests:

python -m pytest

Project layout:

ragtrack/
|-- ragtrack/
|   |-- cli/
|   |-- db/
|   |-- models/
|   |-- services/
|   |-- parsers/
|   |-- chunking/
|   |-- embeddings/
|   |-- vectorstore/
|   `-- storage/
|-- tests/
|-- pyproject.toml
|-- Dockerfile
|-- docker-compose.yml
`-- README.md

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
deploy/nginx		deploy/nginx
ragtrack		ragtrack
scripts		scripts
tests		tests
website		website
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RagTrack

What Problem It Solves

Features

Installation

Local Usage

Web Interface

CLI Examples

Docker Usage

Configuration

Why FastEmbed

V1 Limitations

Roadmap

Development

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RagTrack

What Problem It Solves

Features

Installation

Local Usage

Web Interface

CLI Examples

Docker Usage

Configuration

Why FastEmbed

V1 Limitations

Roadmap

Development

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages