Skip to content

megokul/sagemaker_mob_price_classification

Repository files navigation

📱 Mobile Price Classification – SageMaker Pipeline

🚀 End-to-end workflow for training, deploying, and testing a Random Forest Classifier on AWS SageMaker for mobile price category prediction. Includes dataset preparation, S3 uploads, model training, endpoint deployment, prediction testing, and resource cleanup.


Python AWS SageMaker Boto3 Scikit-learn Pandas NumPy


✅ Features

  • ✅ Train/test split and CSV export for SageMaker
  • ✅ S3 data upload with boto3 + SageMaker SDK
  • ✅ Train a custom Scikit-learn Random Forest model on SageMaker
  • ✅ Deploy trained model as a SageMaker endpoint
  • ✅ Test predictions directly from the notebook
  • ✅ Automated cleanup of endpoints, configs, and models

📂 Project Structure

mobileprice-sagemaker-init.ipynb   # Main notebook for the entire workflow
script.py                          # Training script executed on SageMaker
.env                               # Environment variables (S3 bucket, role ARN, AWS keys)
mob_price_classification_train.csv # Input dataset

🔁 Workflow

CSV Dataset → Train/Test Split → Upload to S3 → Train on SageMaker → Deploy Endpoint → Test Prediction → Cleanup

⚙️ Configuration

This project uses .env for sensitive configuration.

Environment Variables (.env):

AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=

SAGEMAKER_BUCKET_NAME=
SAGEMAKER_ROLE_ARN=

🧪 How to Run

1️⃣ Setup Environment

pip install boto3 sagemaker pandas scikit-learn python-dotenv

2️⃣ Prepare .env

Fill in your .env with AWS credentials, S3 bucket, and SageMaker role ARN.

3️⃣ Run Notebook

Open mobileprice-sagemaker-init.ipynb and execute cells in order:

  1. Data Preparation – Load dataset, split, and save train.csv / test.csv
  2. Upload to S3 – Upload CSVs to SageMaker-accessible bucket
  3. Train Model – Launch SageMaker SKLearn training job
  4. Deploy Model – Create a SageMaker endpoint for inference
  5. Test Predictions – Run predictions on sample test data
  6. Cleanup – Delete endpoint, endpoint config, and model

📈 Model Details

  • Algorithm: RandomForestClassifier (scikit-learn)

  • Hyperparameters:

    • n_estimators: 100
    • random_state: 0
  • Framework Version: 0.23-1

  • Training Instance: ml.m5.large

  • Deployment Instance: ml.m4.xlarge


🧹 Cleanup Commands

When finished, run the cleanup functions to avoid unnecessary billing:

cleanup_endpoint(endpoint_name, delete_model_name=model_name)
# OR
cleanup_all_endpoints(delete_models=True)

📊 Sample Prediction Output

sample = testX[features][:2].values.tolist()
prediction = predictor.predict(sample)
print(prediction)
# Example:
# [{'predictions': [2, 3]}]

🔐 Licensing

This project is licensed under MIT License (update if needed).


👨‍💻 Author

Gokul Krishna N V Machine Learning Engineer — UK 🇬🇧 GitHubLinkedIn

About

End to End Machine Learning Project using Sagemaker

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors