# Ridge Regression for Machine Learning

Overview

1. Introduction
2. Applications of Ridge Regression.
3. How does that algorithm work?
4. How do we choose the best parameters?
5. Pseudo Code of the algorithm.
6. Implement our model with scikit-learn
7. Summary

Ridge Regression is the most commonly used method of regularization of ill-posed problems and when the data is suffered from multicollinearity(when independent variables are highly correlated).

It is also called Tikhonov regularization. It is a method to overcome the problem of overfitting- the condition which arises when the model completely fits the training data but is performing poorly on new data. Before dwelling into ridge regression, the concept of regularization is important. First of all, let us discuss the concept of regularization and why it is essential.

## Regularization

Regularization is the process of reducing error by fitting the function appropriately on the given training set and avoid overfitting. It adds a penalty for different parameters of the model in order to reduce the freedom of the model. In this way, the model will be less likely to fit the noise of the training data and will improve the generalization abilities of the model. There are three common regularization techniques which are mentioned below:

1. Lasso Regression

2. Ridge Regression

3. Elastic Net Regression(It combines both Lasso Regression and Ridge Regression)

Let us consider we have a regression model as follows:

y = b0 + b1X1 + b2X2 + b3X3 + …………….. + bnXn

The fitting of this regression model on the data involves a loss function which is also known as the residual sum of squares or RSS which is calculated as mentioned below:

RSS = where,

y = dependent variable

h(X) = predicted variable,

m = number of iterations

## Ridge regression

Ridge regression is obtained by adding a shrinkage quantity in the above equation of RSS. It can be obtained by following formula:

Ridge Regression = RSS +

where, b = coefficient

ƛ = tuning parameter

j = jth feature parameter

p = number of the feature parameter

Finally, the expression becomes

Ridge Regression =

The above equation shows ridge regression which is RSS plus the shrinkage quantity. Now, the coefficients are estimated by minimizing this function. The tuning parameter(ƛ) gives how much we want to penalize the flexibility of our model. The increase in flexibility of a model is represented by its increase in its coefficients and if we want to minimize the regression model, our coefficients need to be small.

## Effect of ƛ on the regression model

When the value of ƛ is 0, the penalized term has no effect and the estimates produced by ridge regression will be the same as the least squares.

As ƛ approaches , the impact of the shrinkage penalty grows and ridge regression coefficient estimates will approach zero.

Therefore, selecting a suitable value of ƛ is essential.

The coefficients selected by this method are known as the L2 norm.

When to use Ridge Regression

Ridge regression is often used when the independent variables are collinear. The issue with collinearity is that the variance of the parameter estimation is huge. Ridge regression reduces this variance at the price of introducing bias to the estimates.

Advantages of Ridge Regression:

There are mainly 2 advantages of ridge regression:

1. Adding a penalty term to the model reduces overfitting.

2. Adding a penalty term guarantees that we can find the solution.

Disadvantages of Ridge Regression:

The main disadvantage of ridge regression is the model interpretability. The regression model will shrink the coefficients of less important predictors very close to zero but never reduce them to exact zero. So, the number of predictors in the regression models is not less than the original model.

Difference between Lasso and Ridge Regression

The main difference between Lasso and Ridge Regression lies in the elimination of the prediction variables. Remember that Ridge Regression cannot zero out the coefficients, so we end up with all the predictor variables in the regression model or none of them. In contrast, Lasso Regression comes in handy on both parameter shrinkage and variable selection automatically. We need to consider using Ridge regression when the independent variables are highly correlated because lasso picks only one of them and shrinks the remaining variables to zero.

Use of Ridge Regression using sklearn

Ridge regression can be implemented very easily with the library scikit-learn.

from sklearn.linear_model import Ridge
import numpy as np
n_samples, n_features = 10, 5
np.random.seed(0)
y = np.random.randn(n_samples)
X = np.random.randn(n_samples, n_features)
clf = Ridge(lambda=1.0)
clf.fit(X, y)
Ridge(lambda=1.0, copy_X=True, fit_intercept=True, max_iter=None, normalize=False, random_state
= None, solver=’auto’, tol=0.001)

The documentation is in the link below: