🧀 TabularML - Advanced ML Pipeline with Streamlit UI

A comprehensive machine learning pipeline for tabular data with a beautiful Streamlit web interface and automated UV environment setup.

✨ Features

🚀 Machine Learning Pipeline

Automated Data Processing: Handles numeric and categorical features automatically
Smart Feature Selection: Uses Random Forest for intelligent feature selection
LightGBM Integration: Fast and efficient gradient boosting algorithm
Hyperparameter Tuning: Automated model optimization with GridSearchCV
Comprehensive Evaluation: Detailed performance metrics and visualizations
Model Persistence: Saves trained models for deployment

🎨 Streamlit Web Interface

Interactive Dashboard: Beautiful, responsive web interface
Data Exploration: Comprehensive data analysis and visualizations
Real-time Training: Live progress tracking during model training
Model Evaluation: Detailed performance metrics with interactive charts
Prediction Interface: Make predictions on new data with confidence intervals
Batch Processing: Upload CSV files for batch predictions

⚡ UV Package Management

Fast Environment Setup: Automated dependency management with UV
Cross-platform Scripts: Works on Windows, macOS, and Linux
Reproducible Builds: Locked dependencies for consistent environments

🛠️ Quick Start

Option 1: Automatic Setup (Recommended)

Linux/macOS:

# Make setup script executable and run
chmod +x setup.sh
./setup.sh

Windows:

# Run the setup batch file
setup.bat

Option 2: Manual Setup

Install UV (if not already installed):

pip install uv

Initialize the environment:

uv sync

Run the application:

# Run the Streamlit UI
uv run streamlit run ui.py

# Or run the pipeline directly
uv run python pipeline.py

🎮 Using the Application

1. Launch the Web Interface

uv run streamlit run ui.py

Then open your browser to http://localhost:8501

2. User Interface Overview

The TabularML web interface provides an intuitive, step-by-step workflow for machine learning:

The interface features:

📊 Interactive Dashboard: Clean, modern design with real-time status updates
🎛️ Navigation Panel: Easy access to all pipeline stages
📈 Data Visualization: Rich charts and graphs for data exploration
⚡ Quick Actions: One-click initialization and data loading

3. Navigate Through the Pipeline

🏠 Home Page

Initialize the pipeline
Load sample data
View system status

📊 Data Exploration

View dataset statistics and metrics
Explore data distributions and correlations
Analyze feature relationships with interactive plots

🔧 Model Training

Configure training parameters
Start model training with live progress tracking
View training logs and results

📈 Model Evaluation

Detailed performance metrics (R², RMSE, MAE, MSE)
Predictions vs Actual scatter plots
Residuals distribution analysis
Feature importance charts
Model parameter inspection

🔮 Predictions

Single Predictions: Enter feature values for individual predictions
Batch Predictions: Upload CSV files for bulk processing
Confidence Intervals: Get prediction uncertainty estimates

⚙️ Settings

Configure model parameters
Adjust preprocessing options
System information and controls

📊 Pipeline Architecture

The ML pipeline follows these steps:

Data Loading: Loads dataset (with fallback to synthetic data)
Data Preprocessing: Handles missing values, scaling, and encoding
Train-Test Split: Divides data into training and testing sets
Feature Selection: Identifies top features using Random Forest
Model Building: Trains LightGBM with hyperparameter tuning
Model Evaluation: Comprehensive performance assessment
Deployment: Saves model for production use

🔧 Configuration

Dependencies (pyproject.toml)

Core ML: pandas, scikit-learn, lightgbm, numpy
Visualization: matplotlib, plotly, seaborn
Web Interface: streamlit
Utilities: joblib for model persistence

UV Scripts

The setup includes pre-configured UV scripts for common tasks:

Environment initialization
Dependency installation
Application launching

🎯 Sample Dataset

The application includes a synthetic housing dataset with:

1000 samples with 12 features
Numeric features: Income, house age, rooms, location, etc.
Categorical features: Property type, year built
Target: House price prediction

📈 Performance

The pipeline achieves excellent performance on the sample dataset:

R² Score: ~0.97 (97% variance explained)
RMSE: ~0.99 (low prediction error)
Training Time: ~15 seconds for full pipeline

🔍 Advanced Features

Interactive Visualizations

Target distribution histograms
Feature correlation heatmaps
Scatter plot matrices
Predictions vs actual charts
Residuals analysis

Model Insights

Feature importance rankings
Model parameter inspection
Training progress tracking
Comprehensive evaluation metrics

Production Ready

Model serialization with joblib
Batch prediction capabilities
Error handling and validation
Scalable architecture

🚀 Extending the Pipeline

Adding New Datasets

Modify the fetch_data() method in pipeline.py
Ensure your data has a 'label' column for the target
The pipeline automatically handles numeric/categorical features

Customizing Models

Update the model_building() method
Modify hyperparameter grids in the training configuration
Add new evaluation metrics as needed

UI Customization

Modify ui.py to add new pages or features
Update the navigation and styling
Add new visualization types

📚 Dependencies

Core Libraries

pandas: Data manipulation and analysis
scikit-learn: Machine learning toolkit
lightgbm: Gradient boosting framework
streamlit: Web application framework
plotly: Interactive visualizations

Development Tools

uv: Fast Python package manager
pytest: Testing framework (optional)
black: Code formatting (optional)

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

📄 License

This project is open source and available under the MIT License.

🎉 Acknowledgments

Built with modern Python ML stack
Inspired by best practices in MLOps
Designed for both beginners and experts
Emphasis on user experience and visualization

Ready to explore your data? Start with ./setup.sh and launch the Streamlit interface! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
.gitignore		.gitignore
README.md		README.md
pipeline.py		pipeline.py
pyproject.toml		pyproject.toml
readme.md		readme.md
setup.bat		setup.bat
setup.sh		setup.sh
ui.py		ui.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🧀 TabularML - Advanced ML Pipeline with Streamlit UI

✨ Features

🚀 Machine Learning Pipeline

🎨 Streamlit Web Interface

⚡ UV Package Management

🛠️ Quick Start

Option 1: Automatic Setup (Recommended)

Linux/macOS:

Windows:

Option 2: Manual Setup

🎮 Using the Application

1. Launch the Web Interface

2. User Interface Overview

3. Navigate Through the Pipeline

🏠 Home Page

📊 Data Exploration

🔧 Model Training

📈 Model Evaluation

🔮 Predictions

⚙️ Settings

📊 Pipeline Architecture

🔧 Configuration

Dependencies (pyproject.toml)

UV Scripts

🎯 Sample Dataset

📈 Performance

🔍 Advanced Features

Interactive Visualizations

Model Insights

Production Ready

🚀 Extending the Pipeline

Adding New Datasets

Customizing Models

UI Customization

📚 Dependencies

Core Libraries

Development Tools

🤝 Contributing

📄 License

🎉 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages