🚀 End-to-end workflow for training, deploying, and testing a Random Forest Classifier on AWS SageMaker for mobile price category prediction. Includes dataset preparation, S3 uploads, model training, endpoint deployment, prediction testing, and resource cleanup.
- ✅ Train/test split and CSV export for SageMaker
- ✅ S3 data upload with
boto3+ SageMaker SDK - ✅ Train a custom Scikit-learn Random Forest model on SageMaker
- ✅ Deploy trained model as a SageMaker endpoint
- ✅ Test predictions directly from the notebook
- ✅ Automated cleanup of endpoints, configs, and models
mobileprice-sagemaker-init.ipynb # Main notebook for the entire workflow
script.py # Training script executed on SageMaker
.env # Environment variables (S3 bucket, role ARN, AWS keys)
mob_price_classification_train.csv # Input dataset
CSV Dataset → Train/Test Split → Upload to S3 → Train on SageMaker → Deploy Endpoint → Test Prediction → Cleanup
This project uses .env for sensitive configuration.
Environment Variables (.env):
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_REGION=
SAGEMAKER_BUCKET_NAME=
SAGEMAKER_ROLE_ARN=pip install boto3 sagemaker pandas scikit-learn python-dotenvFill in your .env with AWS credentials, S3 bucket, and SageMaker role ARN.
Open mobileprice-sagemaker-init.ipynb and execute cells in order:
- Data Preparation – Load dataset, split, and save
train.csv/test.csv - Upload to S3 – Upload CSVs to SageMaker-accessible bucket
- Train Model – Launch SageMaker SKLearn training job
- Deploy Model – Create a SageMaker endpoint for inference
- Test Predictions – Run predictions on sample test data
- Cleanup – Delete endpoint, endpoint config, and model
-
Algorithm: RandomForestClassifier (
scikit-learn) -
Hyperparameters:
n_estimators: 100random_state: 0
-
Framework Version:
0.23-1 -
Training Instance:
ml.m5.large -
Deployment Instance:
ml.m4.xlarge
When finished, run the cleanup functions to avoid unnecessary billing:
cleanup_endpoint(endpoint_name, delete_model_name=model_name)
# OR
cleanup_all_endpoints(delete_models=True)sample = testX[features][:2].values.tolist()
prediction = predictor.predict(sample)
print(prediction)
# Example:
# [{'predictions': [2, 3]}]This project is licensed under MIT License (update if needed).
Gokul Krishna N V Machine Learning Engineer — UK 🇬🇧 GitHub • LinkedIn