This project is focused on predicting the quality of wine using machine learning algorithms.
The dataset used in this project is available on Kaggle. It contains information about various physicochemical properties of wine and its corresponding quality rating given by experts.
- pandas
- numpy
- matplotlib
- seaborn
- sklearn
- xgboost
The following machine learning algorithms are implemented in this project:
- Support Vector Machine (SVM)
- XGBoost Classifier
- Logistic Regression
The dataset is preprocessed using the MinMaxScaler from the sklearn.preprocessing module. This ensures that all the input features are in the same range. The data is then split into training and testing sets using the train_test_split function from the sklearn.model_selection module.
The models are trained using the training set and the performance is evaluated on the testing set using various metrics such as accuracy, precision, recall, and F1 score. The metrics module from the sklearn library is used for evaluating the model performance.
Finally, the results are visualized using matplotlib and seaborn libraries.
- Clone the repository
- Install the dependencies using
pip install -r requirements.txt - Run the
wine_quality_prediction.ipynbnotebook to see the implementation and results.
The best performing model was XGBoost Classifier with an accuracy of 0.73 and F1-score of 0.74. SVM and Logistic Regression models had an accuracy of 0.67 and 0.65 respectively.
The performance of the models can be further improved by tuning the hyperparameters and by using more advanced feature selection techniques.
- Neeraj Rikhari