Training-ML 🚀

🎯 Objective

This project implements a complete Machine Learning pipeline for binary classification (income prediction).

It demonstrates how to:

preprocess data
train a model
evaluate performance
visualize results
save and reuse a model
expose predictions via a REST API

⚙️ Tech Stack

Python
Pandas
Scikit-learn
NumPy
Matplotlib
Seaborn
Flask

📊 Project Overview

This project follows a standard Machine Learning workflow:

Load dataset
Clean and preprocess data
Train a model (Random Forest)
Evaluate performance
Visualize results
Save trained model (model.pkl)
Serve predictions via API

📁 Project Structure

training/
│
├── data/
│   └── dataset.csv
│
├── src/
│   ├── train.py
│   ├── predict.py
│   ├── preprocess.py
│   ├── metrics.py
│   ├── visualization.py
│
├── models/
│   └── model.pkl
│
├── flask/
│   └── api.py
│
├── app.py
├── requirements.txt
└── README.md

🚀 Installation

Clone the repository and install dependencies:

pip install -r requirements.txt

▶️ Run the Training Pipeline

python app.py

This will:

load and clean data
train the model
evaluate performance
generate visualizations
save the model in /models/model.pkl

🌐 Run the API

python flask/api.py

API will run on:

http://127.0.0.1:5000

📡 API Usage

Endpoint

GET /
GET /health
GET /visualization
POST /predict

Example Request

{
  "features": [1, 50, 35, 50000, 0, 1, 3, 2, 12, 100, 5]
}

Example Response

{
  "prediction": 0,
    "probabilities": {
        "0": 0.7916445304104879,
        "1": 0.20835546958951212
    }
}

Error Example

{
  "error": "Missing features"
}

📈 Model Output

During training, the model prints:

Accuracy score
Confusion matrix
Classification report

It also displays:

Confusion matrix visualization

🧠 Model Details

Algorithm: Random Forest Classifier
Task: Binary classification
Target: PINCP (income threshold classification)
Train/Test split: 80/20
Fixed random state for reproducibility

💡 Key Features

Clean and modular code structure
Reusable and scalable ML pipeline
Model persistence with .pkl
API for real-time predictions
Data visualization included

🔧 Future Improvements

Add cross-validation
Improve feature engineering
Add model comparison (Logistic Regression, XGBoost)
Deploy API (Render, AWS, etc.)
Build a frontend interface

📌 Notes

Ensure your dataset contains the PINCP column
Input features must match training data format
API expects numerical input only

👨‍💻 Author

Machine Learning beginner project focused on building a complete and structured ML pipeline.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training-ML 🚀

🎯 Objective

⚙️ Tech Stack

📊 Project Overview

📁 Project Structure

🚀 Installation

▶️ Run the Training Pipeline

🌐 Run the API

📡 API Usage

Endpoint

Example Request

Example Response

Error Example

📈 Model Output

🧠 Model Details

💡 Key Features

🔧 Future Improvements

📌 Notes

👨‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
flask		flask
models		models
src		src
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Training-ML 🚀

🎯 Objective

⚙️ Tech Stack

📊 Project Overview

📁 Project Structure

🚀 Installation

▶️ Run the Training Pipeline

🌐 Run the API

📡 API Usage

Endpoint

Example Request

Example Response

Error Example

📈 Model Output

🧠 Model Details

💡 Key Features

🔧 Future Improvements

📌 Notes

👨‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages