Random Forest

Overview

The random forest approach is a bagging method where deep trees, fitted on bootstrap samples, are combined to produce an output with lower variance.

Additionally, RF uses another trick to make the multiple fitted trees a bit less correlated with each other: when growing each tree, instead of only sampling over the observations in the dataset to generate a bootstrap sample, we also sample over features and keep only a random subset of them to build the tree.

Bagging + Feature sampling = Lower Variance Error

This way, all trees do not look at the exact same information to make their decisions and it reduces the correlation between the different returned outputs and generates a model more robust to missing data.

Last updated