Local-first document RAG + decision API: ingest PDF (digital + OCR for scans), DOCX, legacy DOC (antiword), and TXT; chunk + embed into PostgreSQL (pgvector); ask with citations or run a structured decision with a small rule engine—all via FastAPI. Ollama provides Mistral and nomic-embed-text in Docker—no OpenAI, Azure, AWS, or Pinecone in this design. docker compose runs Uvicorn with --reload for a smooth dev loop.
- Portfolio snapshot
- What it does
- Tech stack
- Architecture
- Repository layout
- HTTP API (summary)
- Prerequisites & quick start
- Configuration
- Services & ports
- Tests & CI
- Troubleshooting
- Publishing to GitHub
- Security
- Roadmap & future implementations
- Contributing & license
| What you can show | How this repo supports it |
|---|---|
| RAG (retrieve → ground answer → citations) | POST /api/documents/* + POST /api/decisions/ask |
| Guardrails beyond “the model said so” | POST /api/decisions/decide + app/rules/rule_engine.py |
| Local / air-gapped story | Ollama + Postgres in Docker; no cloud model keys |
| Reproducible for reviewers | docker compose, scripts/trial_run.*, GitHub Actions CI |
“Hosting” here means publishing this repository on GitHub and letting others clone + run with Docker on their machine. This project does not ship a production cloud deploy (Kubernetes, serverless, etc.); that is an intentional scope boundary for a portfolio piece.
Monorepo: If this project lives under a parent folder (e.g.
ml-demos/doc-decision-engine), either open this folder as the Git root of a dedicated repo, or change GitHub Actionsdefaults.run.working-directory/ paths so CI runs from the correct subdirectory. The included workflow assumes the repository root is thedoc-decision-enginedirectory.
- Upload → parse (Tesseract OCR when text is thin;
.docvia antiword) → word chunks → embed (nomic-embed-text) → store in pgvector. POST /api/decisions/ask— embed question, similarity search, prompt Mistral with retrieved excerpts, return citations.POST /api/decisions/decide— same retrieval + JSON-shaped LLM output + deterministic rules (confidence, keywords, sections, chunk count).
| Layer | Technology | Notes |
|---|---|---|
| Runtime | Python 3.11 | See Dockerfile |
| API | FastAPI 0.104.x, Uvicorn 0.24.x |
OpenAPI at /docs, /redoc |
| Validation / settings | Pydantic v2, pydantic-settings | Env-driven app/config.py |
| Database | PostgreSQL 16 + pgvector, SQLAlchemy 2.0.x, psycopg2 |
Vectors in Postgres |
| HTTP client | httpx | Ollama /api/embed, /api/chat |
| Embeddings & LLM | Ollama | Default models: nomic-embed-text, mistral |
| pdfplumber, pdf2image, pytesseract, Pillow, poppler | OCR optional via env | |
| Word | python-docx, antiword (.doc) |
antiword in Docker image |
| Tests | pytest 7.4.x |
Against real Postgres in CI (see tests/) |
Pinned versions: requirements.txt.
flowchart LR
subgraph Client
U[HTTP client / browser / scripts]
end
subgraph Docker["Docker Compose"]
A[FastAPI app]
P[(PostgreSQL + pgvector)]
O[Ollama]
end
U -->|REST| A
A -->|SQL + vectors| P
A -->|embed + chat| O
flowchart LR
UP[Upload API] --> PARSE[Parser PDF/DOCX/DOC/TXT]
PARSE --> CH[Chunk words]
CH --> EMB[Embed via Ollama]
EMB --> VS[(pgvector + rows)]
flowchart TB
Q[Question + doc IDs] --> RET[Top-k similarity]
RET --> CTX[Build prompt + excerpts]
CTX --> LLM[Ollama chat]
LLM --> ASK[Answer + citations]
CTX --> DEC[Structured JSON + rule engine]
DEC --> OUT[Decision + flags]
+-----------+ +----------+ +-----------+
| Upload |---->| Parse |---->| Chunk & |
| (API) | | PDF/DOCX | | Embed |
+-----------+ +----------+ +-----+-----+
|
v
+-----------+ +----------+ +-----------+
| Decision |<----| LLM |<----| Vector |
| Response | | (Ollama) | | Search |
| + Rules | | | | pgvector |
+-----------+ +----------+ +-----------+
Docker hardening (high level): API container runs as non-root (uid 1000), no-new-privileges, cap_drop: ALL, pids_limit, mem_limit; Postgres/Ollama have no-new-privileges and memory caps. Details: SECURITY.md.
| Path | Purpose |
|---|---|
app/main.py |
FastAPI app, lifespan, /health |
app/config.py |
Pydantic settings (env) |
app/api/documents.py |
Upload, process, list, get, delete documents |
app/api/decisions.py |
/ask, /decide, history |
app/db/ |
SQLAlchemy engine, models, init |
app/services/ |
Parser, embeddings, LLM, vector store, decision orchestration |
app/rules/rule_engine.py |
Deterministic checks merged with LLM output |
app/prompts.py, app/rag_utils.py |
Shared prompts and citation helpers |
app/models/schemas.py |
Request/response models |
tests/ |
Pytest (Postgres-backed) |
sample_docs/ |
Demo files (e.g. demo_policy.txt) |
uploads/ |
Runtime-only upload storage (empty in git except .gitignore / .gitkeep) |
scripts/trial_run.* |
End-to-end smoke (health → upload → process → RAG) |
scripts/smoke_verify.* |
Lightweight API checks |
docker-compose.yml |
App + includes infra |
docker-compose.infra.yml |
Postgres, Ollama, network, volumes |
.github/workflows/ci.yml |
Build image, start db, pytest |
| Method | Path | Description |
|---|---|---|
GET |
/health |
Liveness + DB connectivity |
POST |
/api/documents/upload |
Multipart file upload |
POST |
/api/documents/process/{id} |
Parse, chunk, embed, index |
GET |
/api/documents/ |
List documents |
GET |
/api/documents/{id} |
Get metadata |
DELETE |
/api/documents/{id} |
Delete document + vectors |
POST |
/api/decisions/ask |
RAG question + citations |
POST |
/api/decisions/decide |
Structured decision + rules |
GET |
/api/decisions/history |
Decision history (see handler for shape) |
Full schemas: http://localhost:8000/docs when the stack is running.
- Docker Desktop (or Docker Engine + Compose v2).
Use the doc-decision-engine directory (where docker-compose.yml lives).
docker compose up -d --build
docker compose exec ollama ollama pull mistral
docker compose exec ollama ollama pull nomic-embed-textHealth:
curl http://localhost:8000/healthPowerShell: curl.exe http://localhost:8000/health or Invoke-RestMethod http://localhost:8000/health
- Swagger: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
chmod +x scripts/smoke_verify.sh && ./scripts/smoke_verify.shpowershell -ExecutionPolicy Bypass -File scripts/smoke_verify.ps1Infra is split into docker-compose.infra.yml (doc-engine-postgres, doc-engine-ollama, network doc-engine-net, named volumes). The main docker-compose.yml includes that file.
docker compose -f docker-compose.infra.yml up -d
docker compose -f docker-compose.infra.yml exec ollama ollama pull mistral
docker compose -f docker-compose.infra.yml exec ollama ollama pull nomic-embed-textAfter models are pulled:
chmod +x scripts/trial_run.sh && ./scripts/trial_run.shpowershell -ExecutionPolicy Bypass -File scripts/trial_run.ps1Optional: ENGINE_URL=http://host:port for a non-default API base URL.
curl -s -X POST http://localhost:8000/api/documents/upload \
-F "file=@sample_docs/demo_policy.txt" \
-F "document_type=policy"
# Then process/{id} and /api/decisions/ask — see Swagger for schemas.Structured decision example:
curl -s -X POST http://localhost:8000/api/decisions/decide \
-H "Content-Type: application/json" \
-d '{
"document_ids": [1],
"decision_type": "compliance",
"question": "Does this policy include signatures and liability language?",
"rules": {"required_sections": ["signature", "liability", "date"]}
}'Copy .env.example → .env and adjust. Compose injects env in docker-compose.yml; the app reads variables via app/config.py (e.g. DATABASE_URL, OLLAMA_BASE_URL, EMBEDDING_MODEL, LLM_MODEL, MAX_UPLOAD_BYTES, chunking, PDF OCR, ANTIWORD_BIN).
Never commit .env with real secrets (gitignored).
| Service | URL / port |
|---|---|
| API | http://localhost:8000 |
| Postgres | host 5433 → container 5432 |
| Ollama | http://localhost:11434 |
Locally (tests need Postgres; Ollama not required):
docker compose up -d db
docker compose run --rm --no-deps app pytest -qWith full stack: docker compose exec app pytest -q
GitHub Actions (.github/workflows/ci.yml): builds the app image, starts db, runs pytest via docker compose run --rm --no-deps app (does not pull Ollama).
| Symptom | What to check |
|---|---|
/health → "database": false |
docker compose ps; wait for db healthy |
| 503 + Ollama message | docker compose logs ollama; ollama pull mistral / nomic-embed-text; ollama list |
| Empty PDF text | PDF_OCR_ENABLED (default true); Tesseract/poppler in image; PDF_OCR_MAX_PAGES |
.doc fails on host |
Use Docker image (antiword included) |
chunk_overlap error |
chunk_size > chunk_overlap (words) |
| Upload too large | MAX_UPLOAD_BYTES (see app/config.py bounds) |
What this repo already includes for a public portfolio repo:
| Item | Location |
|---|---|
| License | LICENSE (MIT) |
| Security notes | SECURITY.md |
| Contributing | CONTRIBUTING.md |
| CI | .github/workflows/ci.yml |
| Env template | .env.example (no secrets) |
| Git ignore | .gitignore — excludes .env, uploads/* (keeps uploads/.gitkeep), caches, venvs |
Before git push: confirm .env is not staged; remove any accidental uploads/ artifacts (runtime files should stay ignored). Initialize git inside doc-decision-engine if this folder is the repo root, or configure CI paths if the repo root is higher up.
See SECURITY.md for container constraints and guidance before exposing the API beyond localhost (TLS, auth, rotating DB credentials, avoiding --reload in production).
Current boundaries
PDF_OCR_MAX_PAGEScaps very large OCR jobs.- OCR quality depends on scans; add
tesseract-ocr-<lang>in the Dockerfile for non-English. - Single Ollama node; first inference can be slow (model load).
- Citation quality depends on chunking and retrieval; ambiguous questions may miss the best span.
- Tests target Postgres (no SQLite + pgvector shortcut).
Possible next steps (not implemented; portfolio extensions)
- Schema migrations (e.g. Alembic) instead of ad-hoc
init_db. - Authentication / tenancy (API keys, OAuth2 proxy) for shared deployments.
- Async job queue for long ingest or batch embedding; progress Webhooks or SSE.
- Hybrid retrieval (BM25 + vector) and optional re-ranking.
- Observability: structured logging, metrics, tracing (OpenTelemetry).
- CI: contract tests with mocked Ollama HTTP for faster feedback without dropping Docker build smoke tests.
- Smaller models documented for CPU-only or low-RAM laptops.
- Rate limiting at reverse proxy; WAF if internet-facing.
- CONTRIBUTING.md — local dev, tests, Ollama pulls.
- MIT — LICENSE.