**LINEAR CLASSIFIERS: LOGISTIC REGRESSION AND NAÏVE BAYES’**

**Introduction**

The goal of any classification is to predict the class of a given object based on its characteristic features. Linear classifiers achieve this by making a classification decision based on the value of a linear combination of the characteristics. We will see the examples of Logistic Regression and Naïve Bayes’ to understand the linear classifiers.

**Logistic Regression**

Logistic Regression is useful to apply in situations where the dependent variable is dichotomous or binary or categorical. Logistic Regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more independent variables of any type (ordinal, nominal, etc.).

Logistic Regression uses the logistic function or the sigmoid function. This function has an S-shaped graph that can take any real-valued number and map it into a value between 0 and 1, but not exactly at these limits. The graph of sigmoid function is given as:

Below is an example logistic regression equation:

y = e^(b0 + b1*x) / (1 + e^(b0 + b1*x))

Where y is the predicted output, b0 is the bias or intercept term and b1 is the coefficient for the single input value (x). Each column in your input data has an associated b coefficient (a constant real value) that must be learned from your training data.

The coefficients (Beta values b) of the logistic regression algorithm must be estimated from your training data. This is done using maximum-likelihood estimation. The best coefficients would result in a model that would predict a value very close to 1 (e.g. male) for the default class and a value very close to 0 (e.g. female) for the other class.

Logistic Regression is a **discriminative model **or** conditional model**. It is a model used for modelling the dependence of target variables y on observed variables x. Within a probabilistic framework, this is done by modelling the conditional probability distribution P(y|x) which can be used for predicting y from x.

**Implementing Logistic Regression using Scikit Learn**

Here, we are using the famous Iris dataset to make a logistic regression model and to predict the class for the test dataset.

# Logistic Regression from sklearn import datasets from sklearn import metrics from sklearn.linear_model import LogisticRegression # load the iris datasets dataset = datasets.load_iris() # fit a logistic regression model to the data model = LogisticRegression() model.fit(dataset.data, dataset.target) print(model) # make predictions expected = dataset.target predicted = model.predict(dataset.data) # summarize the fit of the model print(metrics.classification_report(expected, predicted)) print(metrics.confusion_matrix(expected, predicted)) |

** **

** ****Naïve Bayes’**

The Naive Bayes classifier aggregates information using conditional probability with an **assumption of independence among features**. It assumes that the data are independent of one another which is not a very common characteristic in real world data. But this classifier works very well supporting the assumption.

Bayes’ Theorem provides a way that we can calculate the probability of a hypothesis given our prior knowledge.

Bayes’ Theorem is stated as:

P(h|d) = (P(d|h) * P(h)) / P(d)

Where

- P(h|d) is the probability of hypothesis h given the data d. This is called the posterior probability.
- P(d|h) is the probability of data d given that the hypothesis h was true.
- P(h) is the probability of hypothesis h being true (regardless of the data). This is called the prior probability of h.
- P(d) is the probability of the data (regardless of the hypothesis).

A list of probabilities are stored to file for a learned naive Bayes model. This includes:

- Class Probabilities: The probabilities of each class in the training dataset.
- Conditional Probabilities: The conditional probabilities of each input value given each class value.

Naïve Bayes’ is a **generative model**. Given an observable variable *X* and a target variable *Y*, a generative model is a statistical model of the joint probability distribution on *X* × *Y*, {\displaystyle P(X,Y)}P(X,Y). A generative model can also be defined as a model of the conditional probability of the observable *X*, given a target *y*, symbolically, P(X|Y=y) {\displaystyle P(X|Y=y)}.

**Gaussian Naïve Bayes’**

Naive Bayes can be extended to real-valued attributes by assuming a Gaussian distribution.

This extension of Naïve Bayes is called Gaussian Naïve Bayes. Other functions can be used to estimate the distribution of the data, but the Gaussian (or Normal distribution) is the easiest to work with because you only need to estimate the mean and the standard deviation from your training data.

With real-valued inputs, we can calculate the mean and standard deviation of input values (x) for each class to summarize the distribution.

**Implementing Naïve Bayes’ using Scikit Learn**

# Gaussian Naive Bayes from sklearn import datasets from sklearn import metrics from sklearn.naive_bayes import GaussianNB # load the iris datasets dataset = datasets.load_iris() # fit a Naive Bayes model to the data model = GaussianNB() model.fit(dataset.data, dataset.target) print(model) |

** **

**Logistic Regression vs Naïve Bayes’**

- Logistic Regression and Naïve Bayes’ are both classification algorithms.
- Naïve Bayes models the joint distribution of the feature X and target Y, and then predicts the posterior probability given as P(y|x), and hence it is a Generative model.
- Logistic regression directly models the posterior probability of P(y|x) by learning the input to output mapping by minimising the error, and hence it is a Discriminative model.
- Naïve Bayes’ goes with an assumption that each feature is independent. It leads to poor predictions sometimes when the features are actually dependent.
- Logistic Regression works well when some of the features are correlated because it splits the feature space linearly.