Polynomial Regression for Machine Learning

Polynomial Regression


  1. Introduction
  2. Applications of Polynomial Regression.
  3. How does that algorithm work?
  4. How do we choose the best parameters?
  5. Pseudo Code of the algorithm.
  6. Implement our model with scikit-learn
  7. Summary

Polynomial Regression is defined  as: “In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable (x) and the dependent variable(y) is modeled as an nth degree polynomial in x.”

This definition can be represented in the mathematical formula as follows:

y (Dependent Variable) = a + b * x(Independent Variable) ^ n(degree)

Here, the model becomes the non-linear combination of feature variables i.e., there can be exponential variables, sine, cosine, etc. Polynomial Regression is the type of Linear Regression model. There are different cases of Linear Regression which are briefly stated as below:-

Linear Regression with one variable

h(x) = b0 + b1x

In this type of Linear regression, the predicted value(h(x)) depends on only one variable i.e., x.

Linear Regression with multiple variable

h(x) = b0 + b1x1 + b2x2 + b3x3 + …… + bnxn

In this type of Linear regression, the predicted value(h(x)) depends on multiple variable i.e., x1, x2, … xn.

Polynomial (Linear) Regression

h(x) = b0 + b1x1 + b2x12 + b3x13 + ……. + bnx1n

In this type of Linear regression, the predicted value(h(x)) depends on the power of the single variable i.e., x1, x12, x13 and so on. Polynomial Regression is considered as a special case of Linear Regression. Although polynomial regression fits the non-linear model to data, as a statistical estimation problem it is linear, in the sense that regression function is linear in unknown parameters that are estimated from the data. The gradient models can be trained by Stochastic Gradient Descent (SGD).

Why use polynomial regression?

Suppose we have the data set that would be plotted in the graph in the way shown in the figure below:

Exponential Data Robofied
Fig: Data set that takes a bow shape

The above figure represents the data of those quantities which increases exponentially over some period of time. The horizontal axis could be considered as time and the vertical axis could be considered as a number of people affected by certain types of epidemics. The graph of no. of person vs time takes the above shape.

If we would like to use the Linear Regression model with one variable in the figure shown above, it would look something like this.

Univariate regression robofied
Fig: Using uni-variate regression on curved data

The Regression model is not well fitted as we can see lots of data which are below and above greater distance from the regression model. This would generate the larger squared error and thus would be considered as the bad regression model. The solution to these kinds of problems can be the polynomial regression model.

Polynomial Regression Robofied
Fig: Polynomial regression on curved data

As shown in the figure, the Regression model fits the data very well. This would minimize our cost function and will give the optimum result on the Regression.

How do you know you need Polynomial Regression?

First of all, the polynomial term in the equation of Polynomial Regression creates a U-shaped curve(U or inverted U). If the scatter plot of data gives a curvilinear shape, we need to know that we should use Polynomial Regression. Similarly, if we try to fit a linear model to a curved data, a scatter plot of residuals(Y-axis) on the predictor(X-axis) will have patches of many positive residuals in the middle, but patches of negative residuals at either end(or vice versa). This indicates that the linear model is not a good fit, and the polynomial may give the better result.

Pros and Cons of using Polynomial Regression

One advantage of using Polynomial Regression is that model works on any size of the data set. However, it is fast to model and is particularly useful when the relationship to be model is not too complex and if we don’t have a lot of data. Also, the model works very well on non-linear problems. It will also minimize the cost function as it reduces the squared error.

The reason why we should not use polynomial Regression in all types of the data set is that we may end up choosing a wrong polynomial degree which will result in a bad bias/variance trade-off. It may result in over-fitting. The model is also not considered good for highly complex data.

Different fittings Robofied
Fig: Different kinds of fitting on scattered data

The first figure illustrates Under-fitting. The condition which arises when we use Linear Regression with one variable in the data scattered in that form. This model generates a larger cost function.

The second figure is using polynomial Regression just to the appropriate power. The regression model fits the data well and can be used for better prediction.

The third figure is using polynomial regression to higher powers than required. This is the con of polynomial Regression. The model fits the data well but it cannot be used for a better prediction.

Polynomial Regression Using Sci-kit Learn

If you know the concept of Linear Regression, polynomial Regression is similar, except you need to mention the degree and convert it to the suitable form to be used by Linear Regression later.

#Import required modules

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

#Some random values for input to a model

X_train = [[1,4],[3,5]]
Y_train = [1,2]
X_test = [[1,5]]

#Create a Linear Regressor

lin_regressor = LinearRegression()

#Pass the order of your polynomial here

poly = PolynomialFeatures(degree = 2)

#Convert to be used further to linear regression. 
#If we have two variables a and b, then degree 2 and using fit_transform would give us 1,a, b, a2, ab and b2.

X_transform = poly.fit_transform(X_train)
X_test_ = poly.fit_transform(X_test)

#This finds the coefficient of polynomial regression. This is training part of the algorithm.

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

#Predict the value on the value passed

y_preds = lin_regressor.predict(X_test_)

#Printing the predicted value

Related Articles

Linear Regression by Piyush

Logistic Regression by Piyush

Step-wise Regression by Sarthak Pokharel

Leave a Comment

Your email address will not be published. Required fields are marked *