RagTrack is a lightweight Git-like version control layer for RAG knowledge bases.
It helps you:
- track document changes across ingestions
- avoid stale chunks in retrieval pipelines
- search only the latest knowledge
- inspect what changed between document versions
- keep source traces for retrieved chunks
RagTrack is CLI-first and designed for small VPS deployments. It uses local files, SQLite, SQLAlchemy, keyword search, and a small server-rendered FastAPI UI by default. Optional semantic search uses FastEmbed, an ONNX Runtime based embedding provider that avoids PyTorch and CUDA dependencies. RagTrack does not use Redis, Celery, MinIO, Kubernetes, authentication, external cloud services, or a JavaScript build pipeline.
RAG apps often treat ingestion as a one-way import. Over time, source documents change, chunks become stale, and it becomes hard to answer simple questions:
- Which files changed?
- Which chunks belong to the latest document version?
- What did the last ingestion add or remove?
- Can I search current knowledge without keeping old chunks active?
RagTrack adds a small version-control layer around local document ingestion so RAG knowledge bases are easier to inspect, refresh, and trust.
- recursive ingestion for
.pdf,.docx,.txt,.md, and.markdown - SHA256 file hashing for change detection
- document versions stored in SQLite
- chunk hashes and chunk metadata stored in SQLite
- keyword search with no embedding package required
- optional FastEmbed semantic embeddings using
BAAI/bge-small-en-v1.5 - optional SentenceTransformers provider for users who explicitly want it
- semantic search limited to latest document versions when embedding support is installed
- diff summaries between latest and previous versions
- status summaries for project health
- local FastAPI/Jinja web interface for upload and inspection
Create and activate a virtual environment:
python -m venv .venv
.\.venv\Scripts\Activate.ps1Install RagTrack:
pip install -e .Install FastEmbed semantic search support only when you want embeddings:
pip install -e ".[embed]"Install the legacy SentenceTransformers provider only if you explicitly want it:
pip install -e ".[st]"For development:
pip install -e ".[dev]"For development with semantic search:
pip install -e ".[dev,embed]"Create a docs directory:
mkdir docsIngest documents:
ragtrack ingest .\docsSearch latest chunks with the default keyword mode:
ragtrack search "wifi password"Semantic search requires embedding support:
pip install -e ".[embed]"
export SEARCH_MODE=semanticShow changes between latest and previous versions:
ragtrack diffShow changes for one document:
ragtrack diff .\docs\house_rules.pdfShow project status:
ragtrack statusRagTrack includes a lightweight server-rendered web interface built with FastAPI, Jinja templates, Tailwind CDN, and HTMX. It reads from the same SQLite database and search index as the CLI.
Run the local UI:
ragtrack serveThen open:
http://127.0.0.1:8000
Bind to a VPS interface:
ragtrack serve --host 0.0.0.0 --port 8000Pages included:
- Dashboard: project counts, recent activity, tracked documents
- Documents: upload supported files, document list, and document detail pages
- Versions: version history
- Search: latest-version chunk search
- Diff: latest vs previous chunk-hash summaries
- Audit: lightweight integrity overview
- Settings: local runtime paths and model configuration
Screenshots can be added under docs/screenshots/ as the interface stabilizes.
Uploaded files are saved to ./docs and ingested through the same versioning, parsing, chunking, and optional search-index pipeline as the CLI.
ragtrack ingest ./docs
ragtrack search "check in time"
ragtrack diff
ragtrack diff ./docs/guest_manual.docx
ragtrack status
ragtrack serve
Example diff output:
house_rules.pdf
v1 -> v2
+ Added: 3 chunks
- Removed: 1 chunk
~ Modified: 5 chunks
Build the image:
docker compose buildRun commands with your local ./docs directory mounted at /docs and local ./.ragtrack mounted for persistence:
docker compose up
docker compose run --rm ragtrack ingest /docs
docker compose run --rm ragtrack search "wifi password"
docker compose run --rm ragtrack diff
docker compose run --rm ragtrack statusThe Docker setup serves the UI on http://localhost:8000 and stores SQLite and local RagTrack data in ./.ragtrack on the host.
RagTrack reads environment variables with the RAGTRACK_ prefix:
RAGTRACK_PROJECT_ROOTRAGTRACK_DATA_DIRRAGTRACK_DATABASE_URLRAGTRACK_STORAGE_DIRRAGTRACK_VECTOR_INDEX_PATHRAGTRACK_EMBEDDING_PROVIDERRAGTRACK_EMBEDDING_MODELRAGTRACK_SEARCH_MODERAGTRACK_LOG_LEVEL
Defaults:
- database:
.ragtrack/ragtrack.db - search mode:
keyword - embedding provider:
fastembed - embedding model:
BAAI/bge-small-en-v1.5 - semantic vector index:
.ragtrack/faiss.json - vector metadata:
.ragtrack/faiss_meta.json
Short environment variable aliases are also supported:
EMBEDDING_PROVIDERSEARCH_MODE
Supported embedding providers:
fastembedsentence-transformers
Supported search modes:
keywordsemantic
FastEmbed is the recommended semantic embedding provider for RagTrack because it is CPU-first, ONNX Runtime based, and avoids the PyTorch, Transformers, CUDA, and NVIDIA dependency chain that makes installs heavy on small VPS machines. This keeps the default RagTrack experience lightweight while still allowing semantic retrieval when users opt in.
- diff uses chunk hashes and estimates modified chunks as
min(added, removed) - semantic search is optional and requires
pip install -e ".[embed]" - SentenceTransformers remains available through
pip install -e ".[st]" - when semantic search is enabled, one local vector index is rebuilt after CLI/UI ingestion
- only latest document versions are searchable
- no background workers
- web UI upload is synchronous in V1
- no authentication
- no remote object storage
- no hosted vector database
- richer diff output with chunk previews
- configurable chunking strategies
- parser metadata improvements
- incremental vector index updates
- optional Qdrant or pgvector backend
- export/import project snapshots
- document restore and pinning
- better source citation formatting
Run tests:
python -m pytestProject layout:
ragtrack/
|-- ragtrack/
| |-- cli/
| |-- db/
| |-- models/
| |-- services/
| |-- parsers/
| |-- chunking/
| |-- embeddings/
| |-- vectorstore/
| `-- storage/
|-- tests/
|-- pyproject.toml
|-- Dockerfile
|-- docker-compose.yml
`-- README.md