A production‑style hybrid AI chatbot that combines classical NLP, machine‑learning intent classification, semantic vector memory, and an LLM fallback used strictly as a teacher.
The system is designed to improve over time by learning from real conversations, discovering new intents, and retraining safely — without blindly trusting LLM outputs.
- Build a realistic AI assistant architecture (not a toy chatbot)
- Minimize LLM usage while still benefiting from it
- Enable offline semantic memory and fast local inference
- Create a safe learning loop from conversations
- Follow professional ML + Git practices
- Intent‑based chatbot (fast, local, inexpensive)
- Semantic vector memory using FAISS (offline & persistent)
- Knowledge base lookup for deterministic answers
- Hinglish → English auto‑translation
- LLM fallback only when the bot fails
- LLM answers saved and reused for training
- Automatic intent discovery from conversations
- Safe retraining pipeline (no blind auto‑learning)
- Memory importance scoring & controlled forgetting
This is a hybrid AI architecture, similar to how real assistants are built in production systems.
- Semantic Vector Memory (FAISS)
- Knowledge Base Lookup
- Intent Classification Model
- LLM Fallback (Teacher Mode)
The LLM is never always‑on. It is only used when the system cannot confidently respond.
chatbot-ai/
│
├── app/ # Runtime chatbot logic
│ ├── chatbot_core.py
│ ├── vector_memory.py
│ ├── llm_fallback.py
│ ├── knowledge_graph.py
│ └── __init__.py
│
├── training/ # Offline learning & improvement
│ ├── train_chatbot.py
│ ├── discover_intents.py
│ ├── auto_append_intents.py
│ ├── auto_generate_responses.py
│ └── llm_to_intents.py
│
├── data/ # Knowledge & memory (partially git‑ignored)
│ ├── intents.json
│ ├── knowledge.json
│ └── README.md
│
├── model/ # Trained models (generated locally)
│ └── README.md
│
├── .gitignore
├── requirements.txt
└── README.md
python -m pip install -r requirements.txtSome components require an additional spaCy model:
python -m spacy download en_core_web_smFrom the project root:
python -m app.chatbot_textTo exit:
quit
- Activated only when intent + memory + knowledge fail
- Generates a response using an LLM
- Question + answer are stored in
data/llm_memory.json - These examples are later converted into training data
Environment variable required:
setx OPENAI_API_KEY "your_api_key_here"The LLM teaches the bot, then steps back.
After chatting with the bot, improve it using:
python training/discover_intents.py
python training/auto_append_intents.py
python training/auto_generate_responses.py
python training/train_chatbot.py- Unrecognized queries are clustered
- New intents are created automatically
- Safe, neutral responses are generated
- Intent model is retrained
- Future LLM usage decreases
- Uses FAISS + sentence‑transformers
- Fully offline after installation
- Persistent across restarts
- Importance‑weighted storage
- Automatic forgetting of low‑value memories
This allows the chatbot to remember concepts, not just exact phrases.
Runtime and personal data are excluded for safety:
- chat history
- unrecognized queries
- LLM memory logs
- FAISS index files
- trained model files
This keeps the repository clean and safe to share.
- Designed for learning and experimentation
- Follows real‑world AI system patterns
- Emphasizes safety, reproducibility, and clarity
- LLM improves the bot over time instead of replacing it
MIT License