This repository contains a complete data science workflow for predicting used car prices using machine learning.
🔗 Original Notebook on Kaggle: View here
📓 Notebook on GitHub: View here
- Model: LightGBM Regressor
- Approach: Gradient Boosting with feature engineering and target transformation
-
Data Cleaning:
- Handling missing values
- Standardization of text features
- Extraction of structured information from raw data
-
Feature Engineering:
- Creation of new features such as
ageandhp - Log transformation of skewed variables
- Encoding of categorical features
- Creation of new features such as
-
Target Transformation:
- Log transformation (
log1p) applied to price - Inverse transformation using
expm1
- Log transformation (
-
Modeling:
- LightGBM implementation for tabular data
- Early stopping to prevent overfitting
- Hyperparameter tuning
- Clone the repo:
git clone https://github.com/lucalullo/Used-car-prices.git - Install dependencies:
pip install pandas numpy scikit-learn lightgbm matplotlib seaborn - Run the
used-car-prices.ipynbnotebook
Author: Luca Lullo
Data Scientist | Machine Learning Applied