It contains various chemical properties of different wine samples, and each sample is labeled as either red or white wine based on its characteristics. The dataset typically includes features such as acidity levels, residual sugar, alcohol content, and other chemical attributes that influence the quality and characteristics of the wine.
- Provided analsysis and modeling wine dataset
No data clean was preformed on the dataset, no missing values or special character. Outliers were deteched and removed..
This is a bar plot of the quality of Alcohol, from this graph you can see there quality of alcohol wine.
You can also see the the level of Quality 5 wine out ranks level of Quality 4 wine by just a couple of inches. Futhuremore,the level of Quality 6 line is also close to level of 4.

This scatterplot shows the amount of citric acid and residiual suger along with the amount of pH in every bottle. The data here shows a different story, with residual suger and citric acid been around 0.2 to 0.4, with most bottle having around 0.3 in citric acid
This is a scatterplot chart that shows the sulphates to total sulfur dioxide in the wine. The mediam suggest to be between 0.4 to 0.5 with the most of wine having 0.5.1 amount of sulphates.
The data suggest that sulphates is used to "prevent oxidation".
I used 5 models to test the data.
- LinearRegression
- RandomForestRegressor
- KNeighborsClassifier
- LogisticRegression
- RandomForestClassifier
Below are the results:
| Model | Score |
|---|---|
| LinearRegression | 0.16 |
| RandomForestRegressor | 0.26 |
| KNeighborsClassifier | 0.53 |
| LogisticRegression | 0.52 |
| RandomForestClassifier | 0.67 |
As you can see the Regression model didn't do that well, but the classifier way better.

