This document proposes a feasibility study to find which machine learning classification model will most accurately predict the species of a flower: Setosa, Versicolor, and Virginica.
This famous Fisher’s is one of the best-known databases to be found in pattern recognition literature. The dataset Iris consists of three classes, which is the flower species. The following are the data’s attributes: Attribute Unit Sepal Length cm Sepal Width cm Petal Length cm Petal Width cm The proposed classification methods will be used to model the data:
- Linear Discriminant Analysis
- Logistic Regression Model
- SVM (Support Vector Machines) Classification
In a technology driven society, more data is acquired each day. With so much data at our grasp, the problem now becomes how to interpret, classify, and model this never-ending pile of informative data. Machines are easily able to classify objects, which hold a numeric value, but is possible for a machine to classify an object by name?
Class (Species) Setosa, Versicolor, Virginica In the field of machine learning, it is recognized that no exact procedure to pick a classification/prediction algorithm. Thus, this report will be finding the best prediction model for classifying flower types; the specific problem used to train our model is the iris data set. The dataset utilizes the variables sepal length (cm), sepal width (cm), petal length (cm), and petal width (cm) for 50 flowers from each of 3 species of iris to predict the class of the flower, which is Setosa, Versicolor, or Virginica.
The questions that will be answered from this report are:
- How can machines read and model values that are not numbers?
- Which classification model yields the best flower classification?
As more information becomes digitalized, the need to sort and classify data grows more demanding. This study seeks to improve the process of plant classification. This study will be able to be an in-depth beginner’s guide for those who are interested in classification models. Beneficiaries of the study include:
- Beginner students studying Machine Learning or Data Science
- Statisticians interested in classification models Required Tasks for Classification Model Report
As an aspiring data scientist, I hope to utilize my knowledge in data cleaning, extraction, and analysis to bring solutions in machine learning that will contribute to society. I have three years of R programming experience, which will be the primary coding language used to analyze the data and one of the languages geared towards statistical analysis and machine learning. Additionally, working towards my Applied Mathematics major and Statistics minor required me to be analytical and rigorous. I have acquired my knowledge machine learning at University of Davis: I have a credible background in mathematics and statistics, which are some of the pillars in classification and decision modeling.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole. (has iris3 as iris.) Shaikh, Raheel. “Choosing the Best Algorithm for Your Classification Model.” Medium, Data Driven Investor, 21 Nov. 2018, medium.com/datadriveninvestor/choosing-the-best- algorithm-for-your-classification-model-7c632c78f38f.