Skip to content

byoung77/Doctor-Who-Oracle

Repository files navigation

The Dr. Who Oracle

A retrieval-augmented chatbot that answers questions about the Doctor Who universe using Wikipedia data.

The Oracle combines vector search, cross-encoder re-ranking, and a large language model to produce concise answers with citations. A desktop interface built with Tkinter allows users to ask questions interactively and inspect the sources used to generate each response.


Overview

The Dr. Who Oracle is a Retrieval-Augmented Generation (RAG) system designed to answer questions about Doctor Who lore.

Instead of relying solely on a language model's internal knowledge, the system retrieves relevant documents from a locally stored vector database constructed from Wikipedia pages. These documents are then re-ranked and supplied to the language model as context when generating an answer.

Each response includes links to the sources used by the model so users can verify the information.


Features

  • Interactive desktop interface built with Tkinter
  • Retrieval-Augmented Generation pipeline
  • FAISS vector database for fast similarity search
  • Cross-encoder re-ranking for improved document relevance
  • Concise answers with explicit citations
  • Separate Sources tab showing the documents used to generate each response

Launch Screen

Launch Screen


Example Interaction

The Oracle answering questions about Doctor Who lore:

Oracle Tab


Source Attribution

Every answer is accompanied by citations to the underlying documents used by the model.

Sources Tab


Architecture

The system follows a standard Retrieval-Augmented Generation pipeline:

User Question

Vector Search (FAISS)

Retrieve Top Documents

Cross-Encoder Re-Ranking

Select Most Relevant Context

LLM Generation

Answer + Citations


Project Structure

DrWhoOracle/

Dr_Who_Oracle.py
Tkinter application and chatbot interface

Dr_Who_FAISS_VectorStore_Create.py
Script used to build the FAISS vector database

DATA/Wiki_Data/
Wikipedia text files used to create the vectorstore

Images/
Intro_Screen.png
Oracle_Tab.png
Sources_Tab.png

README.md


Setting the API Key

The project requires an OpenAI API key.

Set the environment variable before running the program:

export OPENAI_API_KEY="your-api-key"

Verify that the variable is set:

echo $OPENAI_API_KEY


Building the Vector Database

The FAISS vector database can be created from the Wikipedia dataset using:

python Dr_Who_FAISS_VectorStore_Create.py

This script:

  1. Loads the documents from DATA/Wiki_Data/
  2. Splits them into overlapping text chunks
  3. Generates embeddings
  4. Stores the results in a FAISS vector database

Running the Oracle

Launch the chatbot with:

python Dr_Who_Oracle.py

Click Enter on the launch screen to open the chat interface.


Example Questions

Try asking the Oracle questions such as:

  • Why did Donna Noble stop traveling with the Doctor?
  • What is the first appearance of K-9?
  • Who founded Time Lord society?
  • Didn't the Twelfth Doctor meet Davros as a child?

Technologies Used

  • Python
  • Tkinter
  • LangChain
  • FAISS
  • SentenceTransformers Cross-Encoder
  • OpenAI embeddings and chat models

Image Credits

Some images used in the user interface were obtained from publicly available sources.

Images remain the property of their respective copyright holders and are used here for demonstration purposes.


Disclaimer

This project is intended as a demonstration of retrieval-augmented generation techniques using publicly available information about Doctor Who.
All source material originates from Wikipedia.

About

A retrieval-augmented chatbot for answering Doctor Who trivia using Wikipedia and FAISS.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages