Support Vector Machines Project on Iris Dataset

Introduction

This project delves into the fascinating world of machine learning, using the renowned Iris flower data set introduced by Sir Ronald Fisher in 1936. The dataset features measurements from three species of Iris flowers (Iris setosa, Iris virginica, and Iris versicolor), each with 50 samples. The project’s core is to apply Support Vector Machines (SVM) to classify these species based on features like sepal and petal dimensions, demonstrating my proficiency in machine learning techniques.

The Data

The Iris dataset is a classic in machine learning, offering a multifaceted challenge in discriminant analysis. It comprises measurements such as sepal length, sepal width, petal length, and petal width from 150 samples across the three Iris species.

Exploratory Data Analysis (EDA)

sns.pairplot(data=iris,hue='species')

setosa = iris[iris['species']=='setosa']

sns.kdeplot(x=setosa['sepal_width'], y=setosa['sepal_length'],

                 cmap="plasma", shade=True, shade_lowest=False)

Support Vector Machine (SVM) Model

The core of this project is training the SVM Classifier. Here, I demonstrated my skill in employing sklearn’s SVC model, tuning it to the training data, and subsequently evaluating its performance through predictions. This section highlights my hands-on experience in model training and validation.

from sklearn.model_selection import train_test_split
X = iris.drop('species',axis=1)

y = iris['species']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

from sklearn.svm import SVC
model = SVC()
model.fit(X_train,y_train)

	precision	recall	f1-score	support
setosa	1.00	1.00	1.00	19
versicolor	1.00	0.95	0.97	19
verginica	0.92	1.00	0.96	12
accuracy			0.98	50
macro avg	0.97	0.98	0.98	50
weighted avg	0.98	0.98	0.98	50

Classification Report of the SVM model

Accuracy: 0.98 – The model correctly classified 98% of all samples.

Grid Search for Hyperparameter Tuning

A standout feature of this project was the implementation of GridSearchCV to optimize the model’s parameters. This optimization effort is pivotal in demonstrating my ability to enhance model performance and showcases my understanding of hyperparameter tuning in machine learning.

from sklearn.model_selection import GridSearchCV
param_grid = {'C': [0.1,1, 10, 100], 'gamma': [1,0.1,0.01,0.001]} 
grid = GridSearchCV(SVC(),param_grid,refit=True,verbose=2)

grid.fit(X_train,y_train)
grid_predictions = grid.predict(X_test)

	precision	recall	f1-score	support
setosa	1.00	1.00	1.00	19
versicolor	1.00	0.95	0.97	19
virginica	0.92	1.00	0.96	12
accuracy			0.98	50
macro avg	0.97	0.98	0.98	50
weighted avg	0.98	0.98	0.98	50

Classification Report of the SVM model with Grid Search

The results for the SVM model with grid search optimization are identical to the standard SVM model, therefore, we might prefer the simpler SVM model without grid search optimization for efficiency.

However, the grid search process is valuable in cases where the optimal parameters are not known in advance, and it can potentially yield better results, especially in more complex or larger datasets.

Both models show high accuracy and precision, particularly with the Setosa species, where they achieve perfect scores. The minor difference in the precision for Virginica (0.92) does not affect the overall high performance of both models. Thus, either model would be suitable for this dataset, with the standard SVM model being more straightforward to implement.

Conclusion

The SVM project on the Iris dataset was a comprehensive exercise that sharpened and showcased my skills in machine learning, data analysis, and model optimization. The project’s success highlights my ability to apply theoretical knowledge to practical, real-world datasets, underlining my proficiency as a machine learning practitioner.