Boosting for Machine Learning

Overview

In this blog post about Boosting we will be covering the topics mentioned below:

  1. Introduction to ensemble classifiers
  2. Boosting
  3. Algorithms of AdaBoost
  4. Implementation with sci-kit learn
  5. Summary

Introduction to ensemble classifiers

As we have already discussed ensemble in “Random forest”, but let’s take a deeper look into it. For an overview of ensemble go to Random Forest. Ensemble basically handles the variance and bias. These are the main factors that cause a difference in the actual and predicted values, with the help of a group of predictors which come together to give a final prediction. The goal of boosting is to predict either +1 or -1 from various classifiers by combining them.
Learning Ensemble model -:
Classifiers: f1(x) , f2(x),…
weight of each classifier -: w1 , w2, …
Ensemble classifier can be defined as -:
F(x) = sign(w1*f1(x) + w2*f2(x)+…..+wn*fn(x))
We can define in form of equation as:

Here 1<= k <= K where k is the no. of Classifiers.

Boosting

Boosting is an approach to learn features from data. In boosting, start by applying some method such as a tree classifier to the learning data and assign equal weight to each observation. Compute the predicted classifications, and apply weights to the observations in the learning sample. The weights are assigned in a fashion such that the observations which were the most difficult to classify or those were misclassified frequently are given higher weights and vice-versa. Hence, we can say that the weights are inversely proportional to the accuracy of the prediction. Then apply the classifier again to the weighted data, and continue with the next iteration

In boosting or more specifically we talk about Adaboost what we do is learning weighted data, means assign weights to each data point. Suppose αi is the weight assign to ith data point(xi, yi). There can be many types of boosting such as gradient boosting, Adaboost etc.

Algorithm of Adaboost

Steps involved in AdaBoost-:
Step 1: Assign same weights to each data point: α, you can choose it random or α = 1/Total no. of data points.
Step 2: for k = 1 to K repeat steps 3 to 5
Step 3: Learn classifier fk(x) for αi.
Step 4: Compute coefficient wk.
Step 5: Recompute weights αi.
Don’t get confused between α and w. As both are used for different purposes here. α is assigning weights to each data point and w is weight to each classifier fk(x).
So, now we have revolved around step 4 and step 5 to solve our purpose. As both α and w are dependent on each other for their updated values.
if w has higher value then it means f(x) is predicting well.
To calculate wk, we need to find the classification error. As for classification here means we have talked about that each data point is having their own weights then Adaboost is so beautiful that if that data point is misclassifying then updating of weight occurs accordingly. This whole story for classification error because of :

Then computing of weights of classifiers can be calculated as for all 1<= k <= K

Now, its time for Step 5 i.e, updating weights

Let’s look at the significance of the update of weights.
If fk(xi) = yi then αi will decrease, i.e, less importance to this data point now and vice-versa.

Implementation with sci-kit learn

Similarly, we can use gradient boosting from sklearn

In [1]:
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris
from sklearn.ensemble import AdaBoostClassifier

iris = load_iris()
clf = AdaBoostClassifier(n_estimators=100)
In [2]:
scores = cross_val_score(clf, iris.data, iris.target)
scores.mean()  
Out[2]:
0.9599673202614379

Summary

In this blog, we have seen the boosting algorithm or you can say boosted trees. Boosting with AdaBoost algorithm.
Implementation with scikit learn too. Boosting is basically an ensemble technique.

Leave a Comment

Your email address will not be published. Required fields are marked *