My Little Library is a vocabulary-aware reading recommender and story generator for students. It recommends Wikipedia-based reading material that is relevant to a student's topic interest while staying slightly above the student's current known-word level. The final system combines a reproducible Wikipedia data pipeline, exclusive grade-band vocabulary lists, sentence-transformer embeddings, FAISS vector search, vocabulary-aware reranking, SQLite-backed user profiles, a Flask web interface, and local LLaMA GGUF inference.
The system is designed to answer one practical educational question:
Given a student profile and a topic, can the app recommend or generate reading material that is understandable but still introduces useful new vocabulary?
The implemented system supports:
- user login/register and session-based Flask routes
- beginner, intermediate, and advanced reading profiles
- SQLite persistence for users, vocabulary knowledge, reading profiles, recommendations, user books, and saved stories
- Wikipedia article preprocessing and vocabulary analysis
- chunk-level embeddings using
all-MiniLM-L6-v2 - FAISS vector search over normalized embeddings
- two-stage retrieval: broad semantic search followed by vocabulary-aware reranking
- local LLaMA 3.1 8B Instruct GGUF generation through
llama-cpp-python - generated recommendation explanations and personalized stories
- RAG evaluation, retrieval ablation, ROUGE, optional BERTScore, and coverage-window metrics
The project does not train QLoRA adapters. The generator is a pretrained/instruction-tuned LLaMA GGUF model used for local inference.
The repository is organized as a modular Flask application with an offline data/indexing pipeline and an online RAG inference pipeline.
Wikipedia download
-> preprocessing
-> subword tokenization
-> vocabulary/readability analysis
-> sentence-aware chunking
-> sentence-transformer embedding
-> FAISS index + chunk metadata
student login/profile
-> topic query or profile-driven recommendation request
-> query embedding
-> FAISS broad retrieval
-> vocabulary-aware reranking
-> LLaMA recommendation generation
-> SQLite save
-> frontend display
| Component | Implementation |
|---|---|
| Web server | app.py using Flask |
| Frontend | front-end/index.html, main_page.html, CSS, JS |
| Relational database | SQLite via src/db/schema.sql and repositories |
| Vocabulary analysis | scripts/analyze_articles.py |
| Chunking | src/embeddings/chunker.py |
| Embeddings | src/embeddings/embedder.py using all-MiniLM-L6-v2 |
| Vector store | src/embeddings/vector_store.py using FAISS IndexFlatIP |
| Reranking | src/rag/reranker.py using semantic + vocabulary score |
| Retrieval | src/rag/retriever.py two-stage/adaptive retrieval |
| Generator | src/rag/pipeline.py LlamaCppGenerator |
| CLI RAG test | scripts/run_rag.py |
| CLI story test | scripts/generate_story.py |
| Evaluation | scripts/evaluate_rag.py |
MyLittleLibrary/
├── app.py # Flask backend + API routes
├── main.py # Convenience pipeline runner
├── requirements.txt # Python dependencies used by final code
├── README.md
├── data/
│ ├── eval_queries.json # 15 evaluation queries, 5 per level
│ ├── raw/ # small sample raw parquet included; full data ignored
│ ├── processed/ # small cleaned parquet included; full data ignored
│ ├── vocab/ # final runtime vocabulary text files
│ └── vocab_sources/ # source CSV vocabulary files
├── front-end/
│ ├── index.html # login/register page
│ ├── main_page.html # book-style main UI
│ ├── login.js
│ ├── script.js
│ ├── style.css
│ ├── main_page_stylesheet.css
│ └── images/
├── scripts/
│ ├── build_vocab_lists.py # builds exclusive vocab bands
│ ├── build_vocab.py # quick vocab size verification
│ ├── download_wiki.py # streams Wikipedia from Hugging Face datasets
│ ├── preprocess_wiki.py # cleans/filter articles
│ ├── tokenize_articles.py # HF or local GGUF tokenizer support
│ ├── analyze_articles.py # article-level coverage/readability metadata
│ ├── build_index.py # chunk, embed, and save FAISS index
│ ├── run_rag.py # command-line recommendation test
│ ├── generate_story.py # command-line story generation test
│ ├── evaluate_rag.py # retrieval/generation/ablation evaluation
│ ├── generate_eval_queries.py
│ └── seed_test_users.py # creates beginner/intermediate/advanced demo users
└── src/
├── db/
│ ├── connection.py
│ ├── repositories.py
│ └── schema.sql
├── embeddings/
│ ├── chunker.py
│ ├── embedder.py
│ └── vector_store.py
└── rag/
├── pipeline.py
├── retriever.py
├── reranker.py
└── student_profile.py
Large generated files are intentionally not committed by default. This includes the full Wikipedia parquet files, outputs/, the full FAISS index directories, SQLite database files, and local GGUF model files.
The final vocabulary files are exclusive bands:
| File | Meaning |
|---|---|
data/vocab/beginner_1000.txt |
beginner-only K-5 band |
data/vocab/intermediate_3000.txt |
intermediate-only grades 6-8 band |
data/vocab/advanced_6000.txt |
advanced-only grades 9-12 band |
The files themselves are not nested. At runtime, the analysis/indexing/profile code builds cumulative known-word sets when modeling a student:
| Student level | Known-word set used for coverage |
|---|---|
| Beginner | beginner band |
| Intermediate | beginner + intermediate bands |
| Advanced | beginner + intermediate + advanced bands |
This keeps the source vocabulary bands clean while still modeling the realistic assumption that an advanced reader also knows beginner and intermediate words.
Known-word coverage is the main difficulty signal.
| Mode | Target known-word range | Purpose |
|---|---|---|
| RAG article recommendation | 85-97% | General recommendation target |
| Story challenge: low | 95-98% | Mostly familiar text |
| Story challenge: medium | 90-95% | Moderate vocabulary challenge |
| Story challenge: high | 85-92% | Higher challenge |
The weighted reranker uses the deployed scoring balance:
final_score = 0.35 * semantic_score + 0.65 * vocabulary_fit_score
Ubuntu/Linux:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip setuptools wheelWindows PowerShell:
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip setuptools wheelpython -m pip install -r requirements.txtNotes:
requirements.txtincludes the core Flask/data/RAG/evaluation packages used by the code.- The listed
faiss-cpupackage lets the app run on CPU FAISS. To use GPU FAISS, install a CUDA-compatible FAISS build for the workstation environment. llama-cpp-pythonmust be built/installed with CUDA support if you want GPU offload for the GGUF model.- Do not commit Hugging Face tokens,
.envfiles, local model files, local DB files, or generated FAISS indexes.
python -m nltk.downloader cmudictThis starts the Flask app and allows the UI/auth/database routes to run. If the FAISS index or GGUF model is missing, the app falls back to existing DB recommendations or placeholder story text.
python scripts/seed_test_users.py
python app.pyOpen:
http://localhost:5000
Demo accounts:
| Username | Password | Level |
|---|---|---|
test_beginner |
test123 |
Beginner |
test_intermediate |
test123 |
Intermediate |
test_advanced |
test123 |
Advanced |
Important: seed_test_users.py can create users and vocabulary states with the committed vocab files. It only seeds article recommendations if outputs/article_stats.jsonl exists.
Use these commands to reproduce the final-style corpus/index workflow.
python scripts/build_vocab_lists.py \
--input data/vocab_sources \
--output vocab_output \
--runtime-vocab-dir data/vocab \
--only-band all
python scripts/build_vocab.pyExpected runtime files:
data/vocab/beginner_1000.txt
data/vocab/intermediate_3000.txt
data/vocab/advanced_6000.txt
Small debug run:
python scripts/download_wiki.py \
--sample-size 500 \
--output data/raw/wiki_sample.parquetFull final-style run:
python scripts/download_wiki.py \
--sample-size 1000000 \
--output data/raw/wiki_1m.parquetpython scripts/preprocess_wiki.py \
--input data/raw/wiki_1m.parquet \
--output data/processed/wiki_clean.parquet \
--min-words 150Using a Hugging Face tokenizer:
python scripts/tokenize_articles.py \
--input data/processed/wiki_clean.parquet \
--output data/processed/wiki_tokenized.parquet \
--tokenizer sentence-transformers/all-MiniLM-L6-v2 \
--max-length 2048 \
--batch-size 32Using a local GGUF tokenizer instead:
python scripts/tokenize_articles.py \
--input data/processed/wiki_clean.parquet \
--output data/processed/wiki_tokenized.parquet \
--gguf-path models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf \
--max-length 2048 \
--batch-size 32mkdir -p outputs
python scripts/analyze_articles.py \
--input data/processed/wiki_tokenized.parquet \
--output outputs/article_stats_1m.jsonl \
--coverage-min 0.85 \
--coverage-max 0.97For a smaller committed/sample run, use the sample paths:
python scripts/analyze_articles.py \
--input data/processed/wiki_clean.parquet \
--output outputs/article_stats.jsonl \
--coverage-min 0.85 \
--coverage-max 0.97python scripts/check_article_stats.py \
--input outputs/article_stats_1m.jsonl \
--window 0.85:0.97 \
--window 0.90:0.97 \
--window 0.45:0.70For the Flask app's default path:
python scripts/build_index.py \
--articles outputs/article_stats_1m.jsonl \
--index-dir data/faiss_index_1m_chunklevel \
--model all-MiniLM-L6-v2 \
--chunk-size 400 \
--overlap 50 \
--batch-size 256 \
--device cudaFor a smaller local/debug path:
python scripts/build_index.py \
--articles outputs/article_stats.jsonl \
--index-dir data/faiss_index \
--device cudaThe app currently looks for:
data/faiss_index_1m_chunklevel/
If you build a different index path, either rename/copy that folder or update faiss_index_dir in app.py.
The Flask app expects the local model at:
models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
The expected model family is:
Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
Bartowski GGUF release
loaded through llama-cpp-python
context window: 4096
n_gpu_layers: -1
Create the model directory and place the GGUF file there:
mkdir -p models
# copy or download the GGUF file into models/The app uses a single visible GPU by default through CUDA_VISIBLE_DEVICES=0. The CLI/evaluation scripts support explicit GPU visibility and tensor splitting:
python scripts/run_rag.py \
--index-dir data/faiss_index_1m_chunklevel \
--query "space and planets" \
--level intermediate \
--filename models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf \
--cuda-visible-devices 0,1 \
--tensor-split 0.5 0.5 \
--n-gpu-layers -1python scripts/seed_test_users.py
python app.pyThe server listens on:
http://localhost:5000
The app routes include:
| Route | Purpose |
|---|---|
POST /api/auth/register |
create an account |
POST /api/auth/login |
login |
POST /api/auth/logout |
logout |
GET /api/profile |
current profile and reading history |
GET /api/recommendations |
current saved recommendations |
POST /api/recommendations/generate |
run RAG recommendation generation |
GET /api/library |
list user library books |
POST /api/library |
add a user book |
GET /api/books/search?q=... |
search books by title |
POST /api/story/generate |
generate a story |
POST /api/story/save |
save most recent generated story |
GET /api/story/list |
list saved stories |
Topic-query mode:
python scripts/run_rag.py \
--index-dir data/faiss_index_1m_chunklevel \
--query "space and planets" \
--level intermediate \
--top-k 3 \
--top-broad 100 \
--filename models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.ggufProfile-driven mode using a seeded user:
python scripts/run_rag.py \
--index-dir data/faiss_index_1m_chunklevel \
--user-id 1 \
--top-k 3 \
--filename models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.ggufSimulate vocabulary growth after reading a result:
python scripts/run_rag.py \
--index-dir data/faiss_index_1m_chunklevel \
--query "animals and habitats" \
--level beginner \
--simulate-growth \
--filename models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.ggufpython scripts/generate_story.py \
--level intermediate \
--topic "space" \
--genre sci-fi \
--challenge medium \
--target-words 400 \
--max-new-vocab 10 \
--filename models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.ggufUsing a database-backed profile:
python scripts/generate_story.py \
--user-id 2 \
--topic "space" \
--genre sci-fi \
--challenge medium \
--filename models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.ggufThe committed data/eval_queries.json contains 15 evaluation queries: 5 beginner, 5 intermediate, and 5 advanced.
python scripts/evaluate_rag.py \
--index-dir data/faiss_index_1m_chunklevel \
--queries data/eval_queries.json \
--output outputs/eval_weighted_rerank.json \
--mode weighted_rerank \
--skip-generation \
--top-broad 100python scripts/evaluate_rag.py \
--index-dir data/faiss_index_1m_chunklevel \
--queries data/eval_queries.json \
--output outputs/eval_semantic_only.json \
--mode semantic_only \
--skip-generation
python scripts/evaluate_rag.py \
--index-dir data/faiss_index_1m_chunklevel \
--queries data/eval_queries.json \
--output outputs/eval_coverage_filter.json \
--mode coverage_filter \
--skip-generation
python scripts/evaluate_rag.py \
--index-dir data/faiss_index_1m_chunklevel \
--queries data/eval_queries.json \
--output outputs/eval_weighted_rerank.json \
--mode weighted_rerank \
--skip-generationpython scripts/evaluate_rag.py \
--index-dir data/faiss_index_1m_chunklevel \
--queries data/eval_queries.json \
--output outputs/eval_generation.json \
--mode weighted_rerank \
--with-bertscore \
--filename models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf \
--generation-top-k 3The evaluation script reports:
- Precision@5, Precision@10, Precision@20
- Recall@5, Recall@10, Recall@20
- MRR
- percentage of top results inside the target coverage window
- ROUGE-1, ROUGE-2, ROUGE-L when generation is enabled
- optional BERTScore precision/recall/F1
- structured output validity rate
The final report describes the full intended run with approximately:
| Pipeline stage | Count |
|---|---|
| Downloaded Wikipedia articles | 1,000,000 |
| Cleaned usable articles | 628,652 |
| Word-tokenized articles | 628,652 |
| Subword-tokenized articles | 628,652 |
| Generated article chunks | 1,682,652 |
| FAISS vectors | 1,682,652 |
The submitted zip contains a small sample parquet and final source code, not the full generated data/index/model artifacts. Rebuild or copy the external artifacts before running the full RAG demo.
The final report's retrieval table lists these retrieval metrics for the completed evaluation run:
| Metric | @5 | @10 | @20 |
|---|---|---|---|
| Precision@k | 0.2533 | 0.2067 | 0.1600 |
| Recall@k | 0.2533 | 0.4133 | 0.6400 |
| MRR | 0.5534 | - | - |
If additional evaluation runs are completed, update the report placeholders and README metrics together so the report, code, and repository stay consistent.
The code uses LlamaCppGenerator for inference with a pretrained/instruction-tuned GGUF model. There are no project-specific QLoRA training scripts, adapter checkpoints, training epochs, optimizer states, or LoRA merge steps in this repository.
FAISSVectorStore uses IndexFlatIP over normalized vectors. It attempts GPU FAISS if available and falls back to CPU FAISS if GPU functions are unavailable.
The app lazily loads the RAG pipeline and LLaMA generator. The first recommendation/story request can be slow because it may initialize:
- FAISS index and
chunks.pkl - sentence-transformer embedder
- cross-encoder reranker
- local LLaMA GGUF model
For demos, start the app early or run one warm-up request before presenting.
For the full million-article index, the main bottlenecks are expected to be:
- loading large chunk metadata from pickle
- GPU transfer time for FAISS index
- cross-encoder reranking over broad candidate pools
- first LLaMA model load
- Flask configured with
threaded=False
Practical demo optimizations:
- keep
top_broadnear 100 or lower - use
--no-cross-encoderfor faster CLI tests - warm the model/index before the demo
- consider approximate FAISS search for larger deployments
- move chunk metadata into SQLite or memory-mapped storage later
The web app expects:
data/faiss_index_1m_chunklevel/
Build it with scripts/build_index.py, copy it into that path, or edit app.py to point to your local index directory.
The GGUF model is missing or llama-cpp-python is not configured correctly. Confirm this file exists:
models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
Then verify llama-cpp-python can load it.
Check GPU visibility:
nvidia-smi
python - <<'PY'
import torch
print(torch.cuda.is_available())
print(torch.cuda.device_count())
for i in range(torch.cuda.device_count()):
print(i, torch.cuda.get_device_name(i))
PYFor CLI tests, explicitly pass:
--cuda-visible-devices 0or for two visible GPUs:
--cuda-visible-devices 0,1 --tensor-split 0.5 0.5Install requirements inside the active virtual environment:
source .venv/bin/activate
python -m pip install -r requirements.txtpython -m nltk.downloader cmudictUse Hugging Face authentication only on your machine/environment:
hf auth loginNever commit tokens into the repository.
Before submitting or demoing:
python -m py_compile $(find . -name '*.py' -not -path './.venv/*')
python scripts/build_vocab.py
python scripts/seed_test_users.py
python app.pyAlso verify that these external artifacts exist for the full RAG demo:
models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
data/faiss_index_1m_chunklevel/index.faiss
data/faiss_index_1m_chunklevel/chunks.pkl
data/faiss_index_1m_chunklevel/meta.json
outputs/article_stats_1m.jsonl
- Frontend and UI: login page, book-style main page, profile display, library controls, and styling.
- Data pipeline: Wikipedia download, preprocessing, tokenization, vocabulary analysis, and corpus preparation.
- Backend and RAG: Flask routes, SQLite schema/repositories, FAISS vector store, retrieval/reranking, LLaMA generation, story generation, evaluation scripts, and GPU inference testing.
This repository is an academic CS 322 project. Model files, Wikipedia data, and Hugging Face resources may have their own licenses or access restrictions. Keep large external artifacts and private tokens out of git.