What is the Random Forest Algorithm?
Random Forest is an ensemble machine-learning algorithm that follows the bagging technique.
The base estimators in the random forest are decision trees. Random forest randomly selects a set of Features to decide the best split at each node of the decision tree.
Looking at it step-by-step, this is what a random forest model does:
Random subsets are created from the original dataset (bootstrapping).
At each node in the decision tree, only a random set of features is considered to decide the best split.
A decision tree model is fitted on each of the subsets.
The final prediction is calculated by averaging the predictions from all decision trees.
To summarize, the Random forest randomly selects data points and features and builds multiple trees (Forest).
Random Forest is used for feature importance selection.
The attribute (.feature_importances_) is used to find feature importance.
Some Important Parameters:-
n_estimators:
- It defines the number of decision trees created in a random forest.
criterion:
- "Gini" or "Entropy."
min_samples_split:
- It is used to define the minimum number of samples required in a leaf node before a split is attempted.
max_features:
- It defines the maximum number of features allowed for the split in each decision tree.
n_jobs:
- The number of jobs to run in parallel for both fits and predict. Always keep (-1) to use all the cores for parallel processing.