Overview of LAD Regression
- Applications of LAD Regression.
- Mathematics for LAD Regression.
- Algorithm for LAD Regression.
- Implement a LAD Regression model with sci-kit learn.
The Least Absolute Deviation (LAD) Regression is, also known as Least Absolute Error (LAE), Least Absolute Value (LAV), Least Absolute Residual (LAR), is a special type of regression that is different from the Ordinary Least Square (OLS) Regression. LAD regression attempts to find a function which closely approximates a set of data. Generally, the OLS regression method tends to minimize the sum of square deviation, but LAD regression minimizes the sum of absolute deviation.
We calculate the difference between each point of observation and the regression line and square the difference in OLS Regression. Then we try to optimize the result. Whereas in LAD Regression we calculate the absolute difference between the point of observation and the regression line. Then we try to minimize the sum of these absolute values.
LAD regression has the benefit of not being sensitive to the outliers. OLS, on the other hand, is very sensitive to outliers. The reason being the square of the difference between the regression line and the data point. If there is an outlier, the difference is large and adding the square of a large number gives a high weight to the outlier.
Working of the Algorithm
The formula used for calculating the LAD regression is :
Here, we presume that (xi, yi) is the point in our dataset, where 1=1,2,3,….,n. And our aim is to find a function such that f(xi) ≈ yi.
Given below is a graph plotting of the Least Absolute Deviations.
Implementing LAD Regression with Scikit Learn
Ordinary Least Square Regression versus Least Absolute Deviation Regression
|Ordinary Least Square Regression||Least Absolute Deviation Regression
|Not Very Robust||Robust|
|Stable solution||Unstable solution|
|Always 1 solution||Possibly multiple solutions|
LAD Regression is robust in the sense that it is highly unaffected to the outliers in the data. It gives equal weight-age to all the data point. Unlike OLS Regression where the outliers make a huge difference in the result due to the squaring involved.
Now, let us see why LAD Regression is unstable when compared to OLS Regression:
Consider a plot as given below
Here, there are different points, and in order to find a function that fits these data points, we measure the vertical distance from each point to the regression line. The distance can also be measured horizontally or perpendicularly, but generally, vertical distance is considered. Now, we add these errors to find out the actual error in the measurements. Here, the problem arises that the negative and the positive values will cancel each other and this will lead to a wrong result. So, in order to avoid this, we take the absolute values and add them to get the actual error.
Now, our aim is to minimize the error. This can be done through Calculus (differentiation), but performing mathematical operations on absolute values is a time consuming and difficult task.
Therefore, in order to avoid this, we go with the OLS Regression method, where we square the error in order to simplify the mathematical operations of minimizing the errors.
But, at the same time, there are some situations which require the use of absolute values, and hence LAD Regression, and not OLS Regression.
In case of the data set (x, y) and in case of single solution we observe that there are at least two data points through which the absolute deviation line always passes. If there are multiple solutions, there are at least two lines bounded by which the valid region of least absolute deviation solutions will exist. And the two-line pass through at least two data points.
In order to understand the property of having possibly multiple solutions in LAD Regression observe the pink line in the green area in the figure given above. If we tilt the line a little towards the up (still keeping it in the green area) we notice that if the distance of the line from one set of data points increases (below it) but it decreases with almost an equal amount from another set of data points (above it). Due to the property of taking absolute values, this doesn’t have a much larger effect on the result. The sum of absolute errors remains almost the same. Also, since one can tilt the line in infinitely small increments, this also shows that if there is more than one solution, there are infinitely many solutions.
The Least Absolute Deviation model minimizes the absolute value of the residuals, i.e.
This provides a more robust solution when outliers are present, but it does have some undesirable properties, most notably that there are some situations where there is no unique solution, and in fact an infinite number of different regression lines are possible. The reason being the square of the difference between the regression line and the data point. If there is an outlier, the difference is large and adding the square of a large number gives a high weight to the outlier.