Elastic Net Regression for Machine Learning
- Applications of Elastic Net Regression.
- How does that algorithm work?
- How do we choose the best parameters?
- Pseudo Code of the algorithm.
- Implement our model with scikit-learn
Elastic Net Regularization is actually a combination of both L1 and L2 regularization. If you have an idea about this then you will be able to implement both the regularization techniques by just tuning the parameters.
It makes use of both L1 and L2 penalty terms and the equation looks like this.
What is a Regularization Technique?
A Regularization Technique actually is a form of regression which regularizes the coefficient estimates towards zero which means it helps in avoiding the learning of complex and flexible models in order to avoid the risk of over-fitting.
There are many ways of regularizing a model and the common ways are as follows:
L1 Regularization i.e. Lasso Regularization – Now in this technique, it adds regularization terms to the model which are a function of the absolute value of the coefficient of the parameters. The coefficient of the parameters can be made zero during the regularization process and this particular technique can also be used for feature selection plus in generating a parsimonious model.
L2 Regularization i.e. Ridge Regularization – In this particular regularization, it adds the terms to the model which are a function of the square of the coefficient of parameters. The coefficient of the parameters can go near zero but cannot be made zero.Elastic Net Regularization – This is the combination of both the above techniques which adds regularization terms to the model.
Why are Regularization Techniques used?
Regularization techniques are used in Generalized Linear Models while modeling for a number of reasons.A Regularization technique helps in the following ways –
- It addresses variance-Bias Trade-offs and will lower the variance from the model
- It helps in handling the sparse data in a better way
- It helps in predicting more accurately where it minimizes the overfitting on the training data
Why is Elastic Net Regularization technique preferred?
Elastic Net Regularization is preferred over L1 and L2 regression as it solves the limitations of both the methods.
if we have a lot of correlated independent variables in a particular data set, the elastic net will simply form a group that will consist of the variables, and if any one of the variables of the group has a strong relationship with the dependent variable, then the entire group is included in the building of the model because if other variables are omitted, that might result in losing some valuable information which will lead to a poor performance of the model.
What’s the math behind the elastic net?
The elastic net method actually overcomes the limitations of the Lasso regression method which makes use of the below penalty function
The use of the above penalty function has a number of limitations as in the case where the p-value is large and n value is small, the lasso selects at most n variables before it saturates. Elastic net is used in order to overcome limitations which lasso has and thus adds a quadratic part to the penalty (||β||^2) and this when used alone is called the Ridge regression method.
The estimates from the Elastic Net is defined by the below equation
The quadratic penalty actually makes the loss function strictly convex which is why it consists of a special minimum. The elastic net includes both the Lasso and Ridge regression and each of them has a special case where
The elastic net method also finds an estimator with the help of two steps, first for the fixed λ 2, it finds the ridge regression coefficient and then performs a Lasso shrinkage. This type of estimation in the result of the double amount of shrinkage that leads to an increased bias and also a poor prediction. In order to improve the performance of the prediction, the coefficients are re-scaled in the elastic net method by multiplying the estimated coefficient by
An example in Python Code
Now we will use an example to see how does this work on the Boston Housing Data. Before you opt for regularizing GLM regression, you should scale the variables. Now follow the given example where each of the algorithms is explained.
The data set consists of the various housing prices in Boston, Massachusetts area. You will use the data set and split it into testing and training subsets after that you will be determining a suitable metric for the problem. Then you will analyze the data set which will help you in using the optimal model that will generalize your unseen data. After which you will implement the different regression techniques to see which fits best on the model.
Then find out more about the data set that you have with you so that you get insights into what is present in it.
The Visualization is as follows:
Simple Linear Regression
Elastic Net Regression
Coming to the conclusion, it is clear to why Elastic Net Regularization is the best method for model prediction and with the above-mentioned code, one can look at how the data is implemented with the help of this regularization technique.