This guide provides step-by-step instructions for testing the RAG chatbot locally without requiring Azure resources.
Follow these steps to test the entire RAG pipeline locally:
Create a test directory and add some sample executive order documents:
mkdir -p test_docsYou can use executive orders from:
Download a few PDFs or text files of executive orders and save them to the test_docs directory.
Process the documents to extract content and split into chunks:
python scripts/ingest.py --input test_docs --output data/processed_chunks.jsonExpected output:
- Log messages showing document loading and processing
- A
processed_chunks.jsonfile in the data directory
Generate vector embeddings for the document chunks:
python scripts/embed.py --input data/processed_chunks.json --output data/embedded_chunks.jsonExpected output:
- Log messages showing embedding generation
- An
embedded_chunks.jsonfile containing document chunks with embeddings
Create a vector store index from the embedded chunks:
python scripts/create_index.py --input data/embedded_chunks.json --output data/vector_store.jsonExpected output:
- A
vector_store.jsonfile containing the vector store index
Test the vector search functionality:
python scripts/search.py --index data/vector_store.json --query "climate change initiatives"Expected output:
- A list of relevant document chunks related to climate change
Test the interactive RAG command-line interface:
python scripts/rag_cli.py --index data/vector_store.jsonExpected output:
- An interactive CLI where you can enter questions
- Retrieved documents related to your questions
Test the Streamlit web interface:
streamlit run app.pyExpected output:
- A web interface running at http://localhost:8501
- Ability to load the vector store and search for information
- Run the script: streamlit run scripts/run_admin.py
- Enter the admin password when prompted
- The dashboard will be available at http://localhost:8501
Test just the document processor component:
python -c "from src.document_processor import DocumentProcessor; processor = DocumentProcessor(); docs = processor.load_document('test_docs/example.pdf'); print(f'Loaded {len(docs)} document parts')"Test just the embeddings generator component:
python -c "from src.embeddings import EmbeddingsGenerator; generator = EmbeddingsGenerator(); emb = generator.generate_embeddings(['This is a test document']); print(f'Generated embedding with dimension {len(emb[0])}')"Test just the vector store component:
python -c "from src.vector_store import LocalVectorStore; store = LocalVectorStore(); store.load('data/vector_store.json'); print(f'Loaded {len(store.documents)} documents')"Issue: Missing dependencies
Solution: Run pip install -r requirements.txt to install all required packages
Issue: File not found errors
Solution: Make sure you've created the necessary directories (e.g., data/) and check file paths
Issue: Embedding model download issues Solution: Check your internet connection; the first run will download the model
Issue: Out of memory errors Solution: Reduce the number of documents or batch size for processing
If you don't have real executive orders, you can create test documents with:
echo "Executive Order 12345\n\nTitle: Test Executive Order\n\nJanuary 1, 2025\n\nBy the authority vested in me as President by the Constitution and the laws of the United States of America, it is hereby ordered as follows:\n\nSection 1. Policy. This is a test executive order for the RAG system." > test_docs/test_eo.txtTo test with a larger dataset, you can use a loop to generate multiple test documents:
for i in {1..10}; do
echo "Executive Order $i\n\nTitle: Test Executive Order $i\n\nJanuary $i, 2025\n\nBy the authority vested in me as President by the Constitution and the laws of the United States of America, it is hereby ordered as follows:\n\nSection 1. Policy. This is test executive order number $i for the RAG system." > "test_docs/test_eo_$i.txt"
done