This repository contains a machine learning project that predicts housing prices in the Boston area using the Boston Housing dataset. The project demonstrates the complete workflow of a regression problem, including data preprocessing, feature scaling, model training, evaluation, and prediction.
Repository: https://github.com/codebyimran-projects/boston-housing-price-prediction
The goal of this project is to predict the median value of owner-occupied homes (medv) based on various economic, environmental, and housing-related features.
This project covers:
- Loading and cleaning the dataset
- Handling missing values
- Feature scaling
- Training a regression model
- Evaluating model performance
- Predicting house prices using custom input
The dataset contains 506 rows and 14 columns.
| Column | Description |
|---|---|
| crim | Per capita crime rate by town |
| zn | Proportion of residential land zoned for large lots |
| indus | Proportion of non-retail business acres |
| chas | Charles River dummy variable (1 if tract bounds river, else 0) |
| nox | Nitric oxides concentration |
| rm | Average number of rooms per dwelling |
| age | Proportion of owner-occupied units built before 1940 |
| dis | Distance to employment centers |
| rad | Accessibility to radial highways |
| tax | Property tax rate |
| ptratio | Pupil-teacher ratio |
| b | Proportion of Black residents |
| lstat | Percentage of lower-status population |
| medv | Median house value (Target variable) |
Install the required Python packages:
pip install numpy pandas matplotlib scikit-learn- Clone the repository:
git clone https://github.com/codebyimran-projects/boston-housing-price-prediction.git
cd boston-housing-price-prediction-
Make sure
BostonHousing.csvis present in the project folder. -
Run the main script:
python main.py- Linear Regression
This model is suitable because the target variable (medv) is continuous.
The model is evaluated using:
- Root Mean Squared Error (RMSE)
- R² Score (Coefficient of Determination)
These metrics help measure prediction error and how well the model explains variance in house prices.
The project supports predicting house prices using custom input values:
import numpy as np
print("Enter house feature values:")
user_data = []
for col in X.columns:
value = float(input(f"{col}: "))
user_data.append(value)
user_array = np.array(user_data).reshape(1, -1)
user_scaled = scaler.transform(user_array)
predicted_price = model.predict(user_scaled)
print("Predicted House Price (MEDV):", predicted_price[0])The project includes:
- Histogram analysis of room distribution
- Scatter plot of actual vs predicted house prices
These plots help in understanding data distribution and model accuracy.
This project helps understand:
- Regression problems in machine learning
- Data preprocessing techniques
- Feature scaling
- Model training and evaluation
- Real-world prediction workflow
ell me what you want to do next. `