# Support Vector Machine for Machine Learning

Overview

1. Introduction
2. Applications of Support Vector Machine.
3. What is SVM?
4. Hyperparameter Tuning
5. Implement our model with scikit-learn
6. Summary

## Introduction to Support Vector Machine

For beginners, you might be thinking what kind of name it is ‘support vector machine’. Don’t worry it is one of the most powerful and highly useful classification technique. However, it can be used for regression problems too but generally for the classification problems. It is a kind of modification in logistic regression which makes it more robust for both the binary classification and multiclass classification. If you have not gone through logistic I might suggest you go through the Logistic Regression.
For just an overview why it is so good?
1. It uses kernel trick.
2. It is based on a large margin intuition.
So, these are some key points which make SVM so robust.

## Application of SVM

There are no. of applications of SVM in classification.

1. Classification of Images
2. Handwritten Recognition
3. Document Classification

These are some applications, but model accuracy depends mainly on the data. If data is suitable for SVM

## What is “SVM”?is

SVM is a supervised classification technique. It uses the same sigmoid function as of logistic regression but it has some techniques which make it more powerful.

It’s function is as follows -:

f(x) = 1/(1+ e^(-z))

z = (w1*x1 + w2*x2 +….+ wn*xn + b)

here b(w0) =: constant value for weight of x0 where x0=1

The algorithm tries to output the optimal hyperplane(large margin) in order to classify the data points lies in the test set accurately. If you have non-separable data it tries to convert into separable data.

Suppose we have data as shown below:

What do you think, which hyperplane does SVM tries to choose? It will try to choose probably this:

Now, what if have some complex data like this:

How we will separate this type of data?? Now, SVM tries to convert the data into linear space using some equations and all. It converts the data into high dimensional space. It will use z= x2 + y2 and our data now on high dimensional space looks like:

Here, SVM is using the kernel trick. You don’t need to choose this function on your own. SVM has the inbuilt function in libraries. So far we have discussed kernels in SVM. Therefore, using kernel when we plot a graph for the data above in 2-D will be like this:

Let’s dive into the large margin intuition due to which SVM is robust. A good margin is that which has a maximum distance from the nearest support vectors of both the classes. It uses the concept of functional margin and geometric margin to choose the best parameter i.e, gamma for best hyperplane.

Good Margin ( Almost Equidistant from both type of data points )

## Hyper-Parameter Tuning

### A. Gamma

If we are having high gamma values then it will consider only nearby support vectors for training. On the other hand, If we are having low gamma values then it will consider the far points also for the best hyperplane.

### B. Regularization(Choosing C in Sklearn(library of python)):

For high regularization, mean we are misclassifying very few data points in the training set. So, we talk about the value of C then we can check the accuracy of our model on a number of values in a range. But would like to give you a brief overview of that:

If we are having a very large value of C, will choose a hyperplane which is having not that much good margin but classification error for training examples will be very low and it’s vice-versa.
So, it depends upon the data, you can check the accuracy on the test set.

## Implementation using scikit-learn

SVC, NuSVC, and LinearSVC are classes capable of performing multi-class classification on a dataset.
SVC and NuSVC are similar methods, but accept slightly different sets of parameters and have different mathematical formulations (see section Mathematical formulation). On the other hand, LinearSVC is another implementation of Support Vector Classification for the case of a linear kernel. Note that LinearSVC does not accept keyword kernel, as this is assumed to be linear.

### Training a classifier

In [6]:
```from sklearn import svm
#X-> training inputs
#Y-> training outputs
# Here we are training a binary classifier

X = [[1, 0, 2], [0, 1, 3]]
y = [0, 1]

##SVM with setting kernel='linear'
##By default we all have kernel='RBF'
clf = svm.SVC(kernel='linear', C=1)
clf.fit(X, y)
```
Out[6]:
```SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape=None, degree=3, gamma='auto', kernel='linear',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)```

### Predicting the output

In [7]:
```clf.predict([[2., 2., 2.]])
```
Out[7]:
`array([0])`

### Here “LinearSVC” has no parameter kernel as by default it is linear.

In [8]:
```lin_clf = svm.LinearSVC(kernel='linear', C=1)
```
```---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-08a26ddf4f5d> in <module>()
----> 1 lin_clf = svm.LinearSVC(kernel='linear', C=1)

TypeError: __init__() got an unexpected keyword argument 'kernel'```

## Summary

Eventually, we come up with a conclusion that SVM is a powerful algorithm for classification.SVM uses similar cost function as we do in logistic regression but due to the involvement of C, its equation changed a little bit.

For logistic regression, we have cost function in the form of (A+lambda*B).

For SVM we have in the form of (CA+B) where C is the tuning or regularization parameter.