๐ Quick Start ยท ๐ฎ Demo ยท โจ Features ยท ๐๏ธ Architecture
Vidya Setu (Bridge of Knowledge) is an AI-powered learning platform that bridges gaps in educationโempowering disabled and underserved learners with adaptive quizzes and a RAG-based tutor that understands their study material and responds in their language and learning style.
It combines a ML-powered adaptive quiz engine with a RAG-based personal tutor that reads your study material and answers questions in your language, adapted to your accessibility needs.
๐ Built for the Databricks Hackathon 2026 โ demonstrating real-world AI for social good in Indian education.
| Feature | Description |
|---|---|
| ๐ง Adaptive Quiz Engine | FAISS + Sentence Transformers select questions by difficulty and weak topics |
| ๐ Real-time Difficulty Adjustment | Automatically upgrades Easy โ Medium โ Hard based on performance |
| ๐ 10 Indian Languages | English, Hindi, Tamil, Telugu, Bengali, Marathi, Gujarati, Kannada, Malayalam, Punjabi |
| ๐ Text-to-Speech | Sarvam AI reads questions and answers aloud in your chosen language |
| ๐ RAG-Based PDF Tutor | Upload any textbook/notes โ ask questions โ get cited answers |
| โฟ Accessibility Profiles | Adapted outputs for ADHD, Dyslexia, Visual Impairment, Hearing Impairment |
| ๐ Live Analytics Sidebar | Topic mastery, difficulty breakdown, accuracy, and focus recommendations |
| ๐ฏ Source Citations | Every RAG answer cites the exact page number from your PDF |
flowchart TD
User([๐ค Student]) --> App[๐ฅ๏ธ Streamlit App]
App --> QuizEngine[๐ง Adaptive Quiz Engine]
App --> RAGEngine[๐ RAG Learning Engine]
App --> TTS[๐ Sarvam AI TTS]
subgraph QuizEngine[๐ง Adaptive Quiz Engine]
direction LR
ST[Sentence Transformers\nall-mpnet-base-v2] --> FAISS[FAISS\nVector Index]
FAISS --> Retriever[Smart Question\nRetriever]
Retriever --> Difficulty[Difficulty\nPredictor]
end
subgraph RAGEngine[๐ RAG Learning Engine]
direction LR
PDF[PDF Upload\npypdf] --> Chunker[Text Chunker]
Chunker --> BGE[BGE-small\nEmbeddings]
BGE --> Similarity[Cosine\nSimilarity Search]
Similarity --> LLM[LLaMA 3.3 70B\nDatabricks Serving]
end
subgraph Translation[๐ Multilingual Layer]
M2M100[M2M100 418M\nFacebook]
end
App --> Translation
LLM --> App
TTS --> App
| Component | Technology |
|---|---|
| Large Language Model | LLaMA 3.3 70B via Databricks Model Serving |
| Quiz Embeddings | sentence-transformers/all-mpnet-base-v2 |
| RAG Embeddings | BAAI/bge-small-en-v1.5 |
| Vector Search | FAISS (IndexFlatIP) |
| Translation | Facebook M2M100 418M |
| Text-to-Speech | Sarvam AI Bulbul v3 |
| Component | Technology |
|---|---|
| Deployment | Databricks Apps |
| Model Serving | Databricks Model Serving Endpoints |
| Frontend | Streamlit |
| PDF Processing | pypdf |
- Python 3.9+
- Databricks workspace with Model Serving enabled
- Sarvam AI API key (get one here)
git clone https://github.com/abhishek130904/DatabricksHackathon.git
cd DatabricksHackathon
pip install -r requirements.txtEdit app.py and set your credentials:
# Databricks (auto-configured inside Databricks Apps)
DATABRICKS_HOST = "https://<your-workspace>.azuredatabricks.net"
DATABRICKS_TOKEN = "your_databricks_token"
# Sarvam AI
SARVAM_API_KEY = "your_sarvam_api_key"๐ก Inside Databricks Apps, authentication is handled automatically โ no token needed.
streamlit run app.pyApp opens at http://localhost:8501
# Import project to Databricks workspace
databricks workspace import-dir . /Workspace/Users/<your-email>/vidya-setuThen in the Databricks UI:
- Navigate to Databricks โ Apps
- Click Create App
- Select
/Workspace/Users/<your-email>/vidya-setu - Set environment variables (
SARVAM_API_KEY) - Click Deploy โ allow 8โ10 minutes for large ML models to load
1. Open app โ Select subject (Physics / Math / Chemistry)
2. Answer 3 easy questions correctly
3. Watch difficulty auto-upgrade: EASY โ MEDIUM
4. Switch language to Hindi using the Language dropdown
5. Click ๐ "Read Aloud" โ hear the question in Hindi
6. Check sidebar โ see Topic Mastery and Weak Topic focus update live
1. Click "Personalized Learning" tab
2. Upload any PDF (textbook chapter, lecture notes, etc.)
3. Select Accessibility Profile โ "ADHD"
4. Ask: "What is the main concept in this document?"
5. Observe: answer broken into numbered steps (ADHD adaptation)
6. Click ๐ "Listen" โ audio plays with duration shown
7. Expand ๐ Sources โ see exact page numbers cited
8. Switch profile to "Dyslexia" โ ask again โ notice simpler language
"Explain Newton's Laws in simple terms"
"What are the key formulas mentioned in this chapter?"
"Summarize the important points from page 3"
"Give me a step-by-step explanation of this concept"
vidya-setu/
โโโ app.py # Main Streamlit application
โโโ pcmDataset.csv # Physics/Chemistry/Math quiz dataset
โโโ requirements.txt # Python dependencies
โโโ app.yaml # Databricks Apps configuration
โโโ README.md
| Issue | Fix |
|---|---|
LLM endpoint not found |
Run pip install --upgrade databricks-sdk and verify endpoint name |
TTS returns 0:00 duration |
Raw PCM is auto-wrapped in WAV header โ ensure speech_sample_rate: 8000 in payload |
Translation model slow |
M2M100 loads once and is cached โ wait for first load (~2 min) |
Slow startup (8โ10 min) |
Normal โ large ML models (M2M100, Sentence Transformers) load on cold start |
PDF embeddings fail |
Ensure pypdf and sentence-transformers are installed; try a text-based PDF |
FAISS import error |
Run pip install faiss-cpu (or faiss-gpu on GPU instances) |
streamlit>=1.32.0
pandas
numpy
sentence-transformers
faiss-cpu
scikit-learn
transformers
torch
pypdf
databricks-sdk
requests| Profile | Adaptation Strategy |
|---|---|
| ๐ต Default | Balanced explanation with examples |
| ๐๏ธ Visual Impairment | Rich text descriptions, no visual references, full verbal context |
| ๐ Hearing Impairment | Complete written output, no audio-dependent phrasing |
| ๐ Dyslexia | Short sentences, simple vocabulary, bullet points |
| ๐ง ADHD | Numbered steps, concise chunks, high-focus structure |
| Language | Native Script | TTS Support |
|---|---|---|
| English | English | โ |
| Hindi | เคนเคฟเคเคฆเฅ | โ |
| Tamil | เฎคเฎฎเฎฟเฎดเฏ | โ |
| Telugu | เฐคเฑเฐฒเฑเฐเฑ | โ |
| Bengali | เฆฌเฆพเฆเฆฒเฆพ | โ |
| Marathi | เคฎเคฐเคพเค เฅ | โ |
| Gujarati | เชเซเชเชฐเชพเชคเซ | โ |
| Kannada | เฒเฒจเณเฒจเฒก | โ |
| Malayalam | เดฎเดฒเดฏเดพเดณเด | โ |
| Punjabi | เจชเฉฐเจเจพเจฌเฉ | โ |
|
Abhishek Raj IIT Indore @abhishek130904 |
Purvi Jain IIT Indore |
Adarsh Rai IIT Indore |
Lakshya Rishi IIT Indore |
Open-source for educational purposes. Built for Databricks Hackathon 2026.