MedSimplify transforms complex pharmaceutical drug information into clear, patient-friendly medication instructions using a production-grade agentic AI pipeline with hybrid RAG, multi-layer guardrails, and automated evaluation.
Prescription labels in the United States are written at a 9th-grade reading level on average โ far above the 6th-grade literacy of most adults. This gap leads to:
- 50% of patients misunderstanding their medication instructions
- $500B+ in annual preventable healthcare costs from non-adherence
- Disproportionate harm to elderly, immigrant, and low-literacy populations
MedSimplify solves this by using Generative AI + RAG to instantly translate any drug name into structured, plain-language patient instructions.
User Query
โ
โผ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโ
โ Input Guard โโโโโโถโ Hybrid RAG โโโโโโถโ Generate โโโโโโถโ Output Guard โโโโโโถโ Evaluate โ
โ (Node 1) โ โ (Node 2) โ โ GPT-4o-mini โ โ (Node 4) โ โ (Node 5) โ
โ โ โ โ โ (Node 3) โ โ โ โ โ
โ โข Injection โ โ โข FAISS Denseโ โ โข Clinical โ โ โข Hallucin. โ โ โข ROUGE โ
โ detection โ โ โข BM25 Sparseโ โ pharmacist โ โ detection โ โ โข FK โ
โ โข Sanitize โ โ โข Cross-Enc. โ โ prompt โ โ โข Disclaimer โ โ Grade โ
โ โ โ Reranking โ โ โข 6 sections โ โ injection โ โ โข Completโ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโ
โ
โโโโโโโโดโโโโโโโ
โ 248K Drug โ
โ Knowledge โ
โ Base โ
โโโโโโโโโโโโโโโ
| Component | Technology |
|---|---|
| Agentic Orchestration | LangGraph (StateGraph) |
| Dense Retrieval | FAISS + Sentence-BERT (all-MiniLM-L6-v2) |
| Sparse Retrieval | BM25 Okapi |
| Reranking | Cross-Encoder (ms-marco-MiniLM-L-6-v2) |
| Language Model | GPT-4o-mini via LangChain |
| Safety | Custom GuardrailsEngine (4 layers) |
| Evaluation | ROUGE-1/2/L, Flesch-Kincaid, Faithfulness |
| UI | Gradio Blocks |
| TTS | gTTS |
| Translation | Google Translate API |
| PDF Export | ReportLab |
- ๐ค 5-Node LangGraph Pipeline โ input guard โ hybrid RAG โ generate โ output guard โ evaluate
- ๐ Hybrid RAG โ FAISS dense + BM25 sparse + cross-encoder reranking for maximum precision
- ๐ก๏ธ 4-Layer Guardrails โ prompt injection detection, faithfulness scoring, dosage override blocking, mandatory disclaimers
- ๐ Automated Evaluation โ ROUGE scores, Flesch-Kincaid Grade, section completeness, faithfulness
- ๐ 10 Languages โ EN, ES, HI, FR, PT, AR, ZH, DE, JA, TL
- ๐ Text-to-Speech โ voice responses in native language via gTTS
- ๐ PDF Export โ downloadable medication summary reports
- ๐ฌ Multi-turn Memory โ conversation history preserved across turns
- โก Streaming Output โ word-by-word response streaming
| Metric | Score | Target | Status |
|---|---|---|---|
| Flesch-Kincaid Grade Level | 7.8 | โค 8.0 | โ Pass |
| Flesch Reading Ease | 58.6 | โฅ 55.0 | โ Pass |
| ROUGE-1 F1 | 0.41 | โ | โ |
| ROUGE-L F1 | 0.38 | โ | โ |
| Section Completeness | 96% | โฅ 80% | โ Pass |
| Faithfulness Score | 0.21 | > 0.05 | โ Pass |
| Guardrail Pass Rate | 100% | 100% | โ Pass |
Evaluated on 20 representative medication queries across multiple drug classes.
git clone https://github.com/ShanmukhaSaiPrakashJeelakarra/MedSimplify.git
cd MedSimplifypip install pandas numpy faiss-cpu sentence-transformers rank-bm25
pip install langchain langchain-openai langgraph openai langchain-community
pip install gradio gtts deep-translator langdetect
pip install reportlab rouge-score textstat langchain-text-splitters nltkimport os
os.environ["OPENAI_API_KEY"] = "your-api-key-here"Download the Medicine Dataset from Kaggle and upload it to your Google Colab session or place it in the project root.
Open MedSimplify.ipynb in Google Colab and run all cells. The Gradio interface will launch with a public shareable link.
MedSimplify/
โโโ MedSimplify.ipynb # Main notebook (run in Google Colab)
โโโ README.md # This file
โโโ requirements.txt # All dependencies
โโโ data/
โ โโโ medicine_dataset.csv # Kaggle medicine dataset (download separately)
โโโ outputs/
โโโ medication_summary.pdf # Sample exported PDF report
| Property | Value |
|---|---|
| Source | Kaggle โ 11000 Medicine Details |
| Total Records | 248,218 |
| Columns | 58 |
| Side Effect Columns | 42 |
| Use Case Columns | 5 |
| Indexing Sample | 15,000 (configurable) |
Query: "What is augmentin 625 duo tablet used for?"
๐ MEDICATION OVERVIEW
Augmentin 625 is an antibiotic used to treat bacterial infections.
It helps your body fight off germs that can make you sick.
๐ HOW TO USE THIS MEDICATION
Take as prescribed by your doctor, usually every 8โ12 hours with food.
Always complete the full course even if you feel better.
โ ๏ธ POSSIBLE SIDE EFFECTS
โข Nausea โข Vomiting โข Diarrhea โข Skin rash
๐ซ IMPORTANT WARNINGS
Do not take if allergic to penicillin. Inform your doctor of all
current medications before starting this treatment.
๐ ALTERNATIVE OPTIONS
Moxikind-CV 625, Clavam 625, Augpen 625mg Tablet
๐ง SIMPLE EXPLANATION
Augmentin 625 combines amoxicillin and clavulanate to fight bacterial
infections. It works by stopping bacteria from growing. Take it with
food to reduce stomach upset and always finish the full course your
doctor prescribed, even if you start feeling better sooner.
โ๏ธ Medical Disclaimer: For educational purposes only. Always consult
a licensed healthcare provider before making any medication decisions.
Layer 1 โ Input Validation
โโโ Length & format checks
โโโ Regex prompt injection detection (6 patterns)
โโโ High-risk keyword flagging
Layer 2 โ Context Grounding
โโโ RAG forces responses grounded in retrieved drug records
โโโ Low-overlap hallucination flagging
Layer 3 โ Output Safety
โโโ Dosage override pattern detection
โโโ Context faithfulness scoring (threshold > 0.05)
Layer 4 โ Disclaimer Injection
โโโ Mandatory medical disclaimer on EVERY response
This project is accompanied by a research paper submitted to the International Journal of Artificial Intelligence in Healthcare (IJAIH), Inderscience Publishers:
Jeelakarra, S.S.P. (2026). MedSimplify: An Agentic AI System for Automated Generation of Patient-Friendly Medication Instructions Using Hybrid Retrieval-Augmented Generation and Multi-Layer Guardrails. Int. J. Artificial Intelligence in Healthcare.
- Expand knowledge base to full DrugBank + FDA drug labels
- Implement BERTScore and FactScore for rigorous faithfulness evaluation
- Human evaluation study measuring patient comprehension improvement
- Fine-tuned open-source clinical LLM (BioMistral, Llama-3-Med)
- Drug-drug interaction warnings in guardrails
- Clinical pilot study with patient volunteers
- Lewis et al. (2020). Retrieval-Augmented Generation for NLP tasks. NeurIPS 2020.
- Karpukhin et al. (2020). Dense Passage Retrieval. EMNLP 2020.
- Nogueira & Cho (2019). Passage Re-ranking with BERT. arXiv.
- Wolf et al. (2010). Improving prescription drug labels. JAMA Internal Medicine.
- Robertson & Zaragoza (2009). BM25. Foundations and Trends in IR.
MedSimplify is an educational and research prototype. It is not a medical device and should not be used for clinical decision-making. All generated content is for informational purposes only. Always consult a licensed healthcare provider before making any medication decisions.
MIT License โ see LICENSE for details.
Shanmukha Sai Prakash Jeelakarra
Department of Health Informatics, School of Healthcare Professions
Rutgers University
๐ง sj1398@shp.rutgers.edu
๐ LinkedIn
๐ LinkedIn Post
Built as the final project for BINF5550 โ Generative AI for Healthcare, Rutgers University, April 2026.