A production‑ready Retrieval‑Augmented Generation (RAG) system that provides fast, accurate, citation‑backed answers to questions about the Federal Acquisition Regulation (FAR) and Defense Federal Acquisition Regulation Supplement (DFARS). Designed for federal contractors, acquisition professionals, and businesses navigating the federal market.
Federal contracting regulations are complex, distributed across thousands of pages of FAR/DFARS text, and updated frequently. Professionals need fast, reliable, context‑aware answers to support:
- Capture strategy
- Proposal development
- Compliance reviews
- Contract administration
- Market entry decisions
This project automates that research using a modern RAG pipeline.
As a federal contractor, federal employee, or business entering the federal market, I need quick, accurate answers to questions about the regulatory landscape so that I can make informed business strategy decisions.
- Natural‑language question interface
- Retrieval of relevant FAR/DFARS sections
- Accurate, citation‑backed responses
- Up‑to‑date regulatory text
- Reproducible end‑to‑end pipeline
- Deployable locally or via Docker
- FAR and DFARS pulled from official
.ditaXML repositories - Parsed into structured documents
- Metadata normalized for retrieval
- HuggingFace Embeddings (BGE‑small)
- ChromaDB persistent vector store
- Chunking via LlamaIndex
SentenceSplitter
- LlamaIndex orchestration
- Custom ONNX Runtime GenAI LLM (Phi‑4‑mini‑instruct‑onnx)
- ChromaDB for retrieval
- BGE-small embeddings
- Query engine configured with top‑k similarity search
- Fast inference on CPU (no GPU required)
- No PyTorch dependency
- No external API calls (fully local)
- Smaller memory footprint
- Production-ready kernels optimized by Microsoft
- Works well with smaller models like Phi-4-mini-instruct-onnx
- Flask application (
src.app) - Served as an ASGI app via Hypercorn
/chat_streamendpoint with token streaming (Server‑Sent Events)- Lightweight HTML/JS/CSS UI served from
src/app/static/
- Local Python environment
- Docker container (Hypercorn ASGI server)
- GitHub Actions CI pipeline (optional)
User → Flask (ASGI via Hypercorn) → Query Engine → LlamaIndex → ChromaDB → FAR/DFARS DITA Source
Pipeline:
- Clone FAR/DFARS repos
- Parse
.ditaXML - Chunk + embed
- Store in ChromaDB
- Serve via Flask API (ASGI)
- LLM generates answers with citations
fedacq-rag-chatbot/
│
├── src/
│ ├── app/
│ │ ├── __init__.py
│ │ ├── api.py
│ │ ├── config.py
│ │ ├── asgi.py
│ │ └── static/
│ │ ├── index.html
│ │ ├── app.js
│ │ └── styles.css
│ │
│ ├── rag/
│ │ ├── __init__.py
│ │ │
│ │ ├── indexing/
│ │ │ ├── __init__.py
│ │ │ ├── builder.py
│ │ │ └── loader.py
│ │ │
│ │ ├── llm/
│ │ │ ├── __init__.py
│ │ │ └── models.py
│ │ │
│ │ └── retrieval/
│ │ ├── __init__.py
│ │ ├── metadata.py
│ │ ├── parser_dita.py
│ │ └── query_engine.py
│ │
│ └── scripts/
│ └── build_index.py
│
├── data/
│ ├── chroma/ # Persistent ChromaDB index (Git LFS)
│ └── regs/ # FAR/DFARS cloned repositories (Git LFS)
│
├── tests/
│ ├── test_indexing.py
│ ├── test_llm.py
│ ├── test_metadata.py
│ ├── test_parser.py
│ └── test_query_engine.py
│
├── docker/
│ ├── docker-compose.yml
│ └── local.env
│
├── .dockerignore
├── .gitattributes
├── .gitignore
├── Dockerfile
├── Makefile
├── pyproject.toml
├── pytest.ini
├── requirements.txt
└── requirements.lock
git clone https://github.com/PWDevens/fedacq-rag-chatbot.git
cd fedacq-rag-chatbotgit lfs install
git lfs pullpython -m venv .venv
source .venv/bin/activatepython -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .
pip install -r requirements.txthuggingface-cli download microsoft/Phi-4-mini-instruct-onnx \
--include cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/* \
--local-dir .
This creates cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/ with the ONNX model files.
Ensure these folders contain data:
data/chroma/
data/regs/
If empty:
git lfs pullThe RAG index is not built in CI due to the size of FAR/DFARS and the cost of embedding.
To rebuild locally:
First, edit .gitignore and comment out 'data/chroma/chroma.sqlite3'; then,
python -m scripts.build_indexThis will:
- Clone FAR + DFARS into
data/regs/ - Parse
.ditaXML files - Chunk and embed text
- Write a new ChromaDB index into
data/chroma/
After rebuilding, commit the updated index using Git LFS:
git add data/chroma
git commit -m "Rebuild RAG index"
git pushpython -m flask --app src.app run --host=0.0.0.0 --port=7860hypercorn --bind 0.0.0.0:7860 src.app.asgi:appdocker build -t fedacq-rag-chatbot .
docker run -d -p 7860:7860 --name ragbot fedacq-rag-chatbotpytest -qThe CI pipeline performs:
- LFS checkout (downloads prebuilt index)
- Python environment setup
- Dependency installation
- Test execution
- Docker image build
- Docker artifact upload
The CI pipeline does not rebuild the RAG index.
Index building is performed locally and versioned via Git LFS.
Send a POST request:
curl -X POST http://localhost:8080/chat \
-H "Content-Type: application/json" \
-d '{"message": "What does FAR 15.404 say about price analysis?"}'Expected response:
- Summary of the regulation
- Citations
- Retrieved sections
- LLM‑generated explanation