# Ridge Regression for Machine Learning

**Overview**

**Introduction****Applications of Ridge Regression.****How does that algorithm work?****How do we choose the best parameters?****Pseudo Code of the algorithm.****Implement our model with scikit-learn****Summary**

Ridge Regression is the most commonly used method of regularization of ill-posed problems and when the data is suffered from multicollinearity(when independent variables are highly correlated).

It is also called Tikhonov regularization. It is a method to overcome the problem of overfitting- the condition which arises when the model completely fits the training data but is performing poorly on new data. Before dwelling into ridge regression, the concept of regularization is important. First of all, let us discuss the concept of regularization and why it is essential.

**Regularization**

Regularization is the process of reducing error by fitting the function appropriately on the given training set and avoid overfitting. It adds a penalty for different parameters of the model in order to reduce the freedom of the model. In this way, the model will be less likely to fit the noise of the training data and will improve the generalization abilities of the model. There are three common regularization techniques which are mentioned below:

1. Lasso Regression

2. Ridge Regression

3. Elastic Net Regression(It combines both Lasso Regression and Ridge Regression)

Let us consider we have a regression model as follows:

**y = b _{0} + b_{1}X_{1} + b_{2}X_{2} + b_{3}X_{3} + …………….. + b_{n}X_{n}**

The fitting of this regression model on the data involves a loss function which is also known as the residual sum of squares or RSS which is calculated as mentioned below:

y = dependent variable

h(X) = predicted variable,

m = number of iterations

**Ridge regression**

Ridge regression is obtained by adding a shrinkage quantity in the above equation of RSS. It can be obtained by following formula:

where, b = coefficient

ƛ = tuning parameter

j = j^{th }feature parameter

p = number of the feature parameter

Finally, the expression becomes

The above equation shows ridge regression which is RSS plus the shrinkage quantity. Now, the coefficients are estimated by minimizing this function. The tuning parameter(ƛ) gives how much we want to penalize the flexibility of our model. The increase in flexibility of a model is represented by its increase in its coefficients and if we want to minimize the regression model, our coefficients need to be small.

**Effect of **ƛ **on the regression model**

When the value of ƛ is 0, the penalized term has no effect and the estimates produced by ridge regression will be the same as the least squares.

As ƛ approaches ∞, the impact of the shrinkage penalty grows and ridge regression coefficient estimates will approach zero.

Therefore, selecting a suitable value of ƛ is essential.

The coefficients selected by this method are known as the L2 norm.

**When to use Ridge Regression**

Ridge regression is often used when the independent variables are collinear. The issue with collinearity is that the variance of the parameter estimation is huge. Ridge regression reduces this variance at the price of introducing bias to the estimates.

**Advantages of Ridge Regression:**

There are mainly 2 advantages of ridge regression:

1. Adding a penalty term to the model reduces overfitting.

2. Adding a penalty term guarantees that we can find the solution.

**Disadvantages of Ridge Regression:**

The main disadvantage of ridge regression is the model interpretability. The regression model will shrink the coefficients of less important predictors very close to zero but never reduce them to exact zero. So, the number of predictors in the regression models is not less than the original model.

**Difference between Lasso and Ridge Regression**

The main difference between Lasso and Ridge Regression lies in the elimination of the prediction variables. Remember that Ridge Regression cannot zero out the coefficients, so we end up with all the predictor variables in the regression model or none of them. In contrast, Lasso Regression comes in handy on both parameter shrinkage and variable selection automatically. We need to consider using Ridge regression when the independent variables are highly correlated because lasso picks only one of them and shrinks the remaining variables to zero.

**Use of Ridge Regression using sklearn**

Ridge regression can be implemented very easily with the library scikit-learn.

from sklearn.linear_model import Ridge

import numpy as np

n_samples, n_features = 10, 5

np.random.seed(0)

y = np.random.randn(n_samples)

X = np.random.randn(n_samples, n_features)

clf = Ridge(lambda=1.0)

clf.fit(X, y)

Ridge(lambda=1.0, copy_X=True, fit_intercept=True, max_iter=None, normalize=False, random_state

= None, solver=’auto’, tol=0.001)

The documentation is in the link below:

Source: __http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html__