Chapter 7: Ensemble Learning and Random Forests

3 min readJun 16, 2021

A Review of Hands-On Machine Learning with Scikit-Learn, Keras & Tensorflow by Aurélien Géron

A diagram of a Stacking Model with a Blender on as the last step from Hands on Machine Learning

Summary

Ensemble Learning

Ensemble Learning is a method of Machine Learning where you combine the power of multiple effective Machine Learning Models to create one even more effective model. There are many ways you can do this but they allow the creator to harness the benefits of multiple different methods while hedging the negatives of each model through aggregation. To show this effect lets assume we are using 3 models that are 99% effective. This means that each model is wrong approximately 1% of the time. But, the likelihood that 2 out of 3 of these models are wrong at the same time is less than 1% unless at least 2 out of 3 of these models have the same or similar errors.

Voting is probably the most straight forward method of ensemble learning. It operates exactly like the example given in the last paragraph. We just train a number of models that vary in their methods and then using a simple majority rules to decide what the winning decision is. Again, this leans into the strengths of diverse models and helps hedge against the weaknesses of individual models.

Random Forests

Random Forests are simply the combination of Decision Trees and Ensemble Learning. Rather than using a single tree, that is inherently easily over trained, you use a whole forest of trees that are trained to be unique in their approach. This prevents the over training of an individual tree from effecting how the model generalizes to new data.

Boosting

Boosting refers to any type of model that uses multiple weak learners to create a strong learner. It does this by training predictors sequentially until they become an effective model. One way of doing this is through AdaBoost. AdaBoost trains a model by running through multiple iterations of a model where each time the data points that were interpreted wrong in the previous iteration are given more weight, thus making them more important to the model overall and more likely to be interpreted correctly the next time. Gradient Boosting is one of the most popular boosting methods due to its accuracy. Gradient Boosting works by iteratively training its models on the residual errors of the previous model. And as it iterates the residual errors decrease flatten resulting in a very accurate model.

My Thoughts

This was probably the most interesting chapter of HOML that I have read so far. I was honestly blown away by how Blenders are used in stacked Machine Learning Models. I had never though of using Machine Learning to interpret the outputs of other machine learning models but I when I read it I was just shocked by how much sense that made. We see an example of this in the main image of this article. At the bottom we see our original training data with 3 prediction models on top of it each making their own interpretations of the data. From there those 3 predictions are combined into the blending training set and we then train the Blender on this data set. Effectively optimizing the outputs of multiple other machine learning models that are all trained on the original data set. In my mind it takes the simple voting of ensemble learning and makes it even more intelligent by through even more machine learning.

Thanks for reading!

If you have any questions or feedback please reach out to me on twitter @wtothdev or leave a comment!

Additionally, I wanted to give a huge thanks to Aurélien Géron for writing such an excellent book. You can purchase said book here (non-affiliate).

Disclaimer: I don’t make any money from any of the services referenced and chose to read and review this book under my own free will.