An advanced AI-powered data analytics web app built with Streamlit and Google Gemini API. Upload any CSV or Excel dataset and unlock full exploratory analysis, forecasting, clustering, outlier detection, storytelling, and PDF export — all in plain English.
Deploy free on Streamlit Cloud — see setup instructions below.
| Tab | Feature |
|---|---|
| 💬 Chat & AI | Natural language → exact chart (Gemini picks columns automatically) + suggested follow-up questions |
| 📈 Interactive Charts | Histogram, Scatter, Line, Box, Bar with color grouping + Correlation Heatmap |
| 📊 Auto Visualizations | Auto-generated distribution, outlier, missing value, and category charts |
| 🔴 Outlier Detection | Isolation Forest ML — highlights anomalous rows, per-column box plots, downloadable outlier CSV |
| 🔮 Forecasting | Linear regression trend forecasting on any numeric column with forecast table |
| 🧩 Clustering | KMeans auto-grouping — scatter by cluster, pie chart, cluster stats, downloadable result |
| 🔬 EDA | Column health report, skewness/kurtosis, scatter matrix (pairplot), normal curve overlay, missing heatmap |
| 📖 Storytelling | Gemini writes a full narrative (4 styles) — Overview, Trends, Findings, Red Flags, Recommendations |
| 📥 Export | Cleaned CSV, Chat history TXT, Stats TXT, PDF report with pinned charts embedded |
| Layer | Technology |
|---|---|
| Frontend | Streamlit |
| AI / LLM | Google Gemini 2.5 Flash |
| Data Processing | Pandas, NumPy |
| Visualization | Plotly |
| Machine Learning | Scikit-learn (IsolationForest, KMeans, LinearRegression) |
| PDF Generation | ReportLab |
| Language | Python 3.10+ |
git clone https://github.com/YOUR_USERNAME/ai-data-analyst-pro.git
cd ai-data-analyst-propip install -r requirements.txt- Go to aistudio.google.com
- Sign in with Google → Create API Key → Copy it
streamlit run app.pyhttp://localhost:8501
- Push this repo to GitHub
- Go to share.streamlit.io
- Connect your GitHub account → Select this repo
- Set main file path as
app.py - Click Deploy — you get a public URL instantly!
⚠️ Do NOT hardcode your API key. Enter it in the sidebar at runtime.
ai-data-analyst-pro/
├── app.py # Main Streamlit application (1200+ lines)
├── requirements.txt # Python dependencies
├── sample_data.csv # Sample dataset to test the app
├── .gitignore # Files to exclude from GitHub
└── README.md # This file
- Enter your Gemini API key in the sidebar
- Upload a CSV or Excel file (or use
sample_data.csv) - Explore each tab:
- Chat — ask "show me salary distribution" → AI picks the right chart automatically
- EDA — get full column health report, pairplot, skewness analysis
- Outliers — click Run Outlier Detection → see anomalous rows highlighted
- Forecast — pick any numeric column → predict future values
- Clustering — set K → auto-group rows into clusters
- Storytelling — click Generate Story → get a full narrative about your data
- Export — pin charts → generate PDF report with charts embedded
Add screenshots of your app here after deploying.
Thalari Gresu Raju
B.Tech – Artificial Intelligence & Machine Learning
Vignan's Lara Institute of Technology and Science, JNTUK
📧 talarigresuraju@gmail.com
MIT License — free to use, modify, and distribute.
"# AI-Data-Analyst-Pro"