This is part of my answer to interview question 9 which is to explain your favorite machine learning algorithm in five minutes.
Bagging & Boosting Made Simple
Bagging and boosting are two different types of ensemble learners. Ensemble learning is a method of combining many weak learners together to build a more complex learner. This is also called ‘meta-learner’ because ensemble learners combine other types of learners to get a final output. A weak learner is simply any learner that does better than random chance.
The basic idea behind ensembling is that finding many rough rules (weak learners) and combining them is easier and more robust than finding one highly accurate rule to model the data. Bagging and boosting both follow the process of learning over a subset of the data to generate a rule, repeating this step, and then at the end combining all the rules via different methods.
Bagging stands for bootstrap aggregating. With bagging you uniformly sample with replacement from the data in order to make a bunch of different subsets. Then train a learner on each subset of the data. And combine every learning by simply taking the average of all the individual learner’s outputs.
Bagging is considered to be parallel because each model is built independently (subsets are sampled uniformly and with replacement). Bagging is used to decrease variance (decrease an overfit model). It can not improve the overall predictive force of the model because it is using the same data but it can be used on complex models to smooth output and lead to a better out of sample prediction. Random forest is an example of bagging.
Boosting starts out similar to bagging by sampling subsets with replacement and training a learner on each subset of the data. However, after each iteration, boosting will test the learner it just created on the current subset and weight the training examples that were classified incorrectly more than the points it got correct.
This weighting will then influence the probability of picking that training example it got wrong in this iteration, so it is more likely to be picked in the next iteration. Boosting essentially weights the “harder” training examples more and more until it can classify those points correctly. At the end, all the learners are combined by weighting each learner to get a single output.
Boosting is considered to be sequential since each new model is trying to do better where the previous model got training examples wrong. Boosting is used to decrease bias, in other words, to make an underfit model better. Gradient boosting is an example of boosting. AdaBoost is a common boosting algorithm.
- Bagging can decrease variance in an overfit model
- Boosting can decrease bias in an underfit model
- Boosting can be more likely to overfit the data
- Ensemble methods are more complex and may not be as easily interpreted or explained to non-technical stakeholders