A retrieval-augmented chatbot that answers questions about the Doctor Who universe using Wikipedia data.
The Oracle combines vector search, cross-encoder re-ranking, and a large language model to produce concise answers with citations. A desktop interface built with Tkinter allows users to ask questions interactively and inspect the sources used to generate each response.
The Dr. Who Oracle is a Retrieval-Augmented Generation (RAG) system designed to answer questions about Doctor Who lore.
Instead of relying solely on a language model's internal knowledge, the system retrieves relevant documents from a locally stored vector database constructed from Wikipedia pages. These documents are then re-ranked and supplied to the language model as context when generating an answer.
Each response includes links to the sources used by the model so users can verify the information.
- Interactive desktop interface built with Tkinter
- Retrieval-Augmented Generation pipeline
- FAISS vector database for fast similarity search
- Cross-encoder re-ranking for improved document relevance
- Concise answers with explicit citations
- Separate Sources tab showing the documents used to generate each response
The Oracle answering questions about Doctor Who lore:
Every answer is accompanied by citations to the underlying documents used by the model.
The system follows a standard Retrieval-Augmented Generation pipeline:
User Question
↓
Vector Search (FAISS)
↓
Retrieve Top Documents
↓
Cross-Encoder Re-Ranking
↓
Select Most Relevant Context
↓
LLM Generation
↓
Answer + Citations
DrWhoOracle/
Dr_Who_Oracle.py
Tkinter application and chatbot interface
Dr_Who_FAISS_VectorStore_Create.py
Script used to build the FAISS vector database
DATA/Wiki_Data/
Wikipedia text files used to create the vectorstore
Images/
Intro_Screen.png
Oracle_Tab.png
Sources_Tab.png
README.md
The project requires an OpenAI API key.
Set the environment variable before running the program:
export OPENAI_API_KEY="your-api-key"
Verify that the variable is set:
echo $OPENAI_API_KEY
The FAISS vector database can be created from the Wikipedia dataset using:
python Dr_Who_FAISS_VectorStore_Create.py
This script:
- Loads the documents from DATA/Wiki_Data/
- Splits them into overlapping text chunks
- Generates embeddings
- Stores the results in a FAISS vector database
Launch the chatbot with:
python Dr_Who_Oracle.py
Click Enter on the launch screen to open the chat interface.
Try asking the Oracle questions such as:
- Why did Donna Noble stop traveling with the Doctor?
- What is the first appearance of K-9?
- Who founded Time Lord society?
- Didn't the Twelfth Doctor meet Davros as a child?
- Python
- Tkinter
- LangChain
- FAISS
- SentenceTransformers Cross-Encoder
- OpenAI embeddings and chat models
Some images used in the user interface were obtained from publicly available sources.
-
tardis_thumb.png
Source: https://www.flaticon.com/free-icon/tardis_1600954 -
gallif_resize.png
Source: https://www.clipartmax.com/max/m2H7d3Z5H7d3d3d3/ -
gallifreyan2_resize.png
Source: https://favpng.com/png_view/doctor-tenth-doctor-rassilon-gallifrey-time-lord-png/raFa2RUm -
drwhobg_resize.png
Source: https://www.pxfuel.com/en/desktop-wallpaper-qgwrs
Images remain the property of their respective copyright holders and are used here for demonstration purposes.
This project is intended as a demonstration of retrieval-augmented generation techniques using publicly available information about Doctor Who.
All source material originates from Wikipedia.


