This is part of my answer to interview question 9 which is to explain your favorite machine learning algorithm in five minutes.
Decision Trees Made Simple
A decision tree learns from training data by creating a tree structure. This tree structure is made up of nodes, edges, and leaves.
- Each node asks a question about a particular attribute in the data.
- Each edge represents the different values that particular attribute holds and the path to the next node or leaf.
- Every leaf is the end of that path in the decision tree and represents the final output from the decision tree model along that path.
Decision trees can be implemented with a variety of different algorithms, but for the most part, the algorithms look to select the “best” attribute to split on at each node. The top-most nodes in a decision tree are asking questions regarding the most significant attributes that result in the cleanest splits in the classes you are trying to predict.
Decision trees help determine feature importance because of this need to find the “best” attribute to split the data on at each node of the tree. They are considered a “greedy” algorithm because they only split by what is best at the current step without necessarily thinking steps ahead.
To prevent overfitting the maximum depth of a tree or the minimum sample size to allow for another split can be specified. However, these must be set beforehand and may require some trial and error to determine the best values for the given problem. Another method to limit overfitting is to prune the final tree after it has completed. Pruning is the process of removing lower branches on the decision tree until the out of sample error starts to increase.
Decision Tree Advantages
- Easy to understand model
- Feature selection performed by the algorithm
- Little data preparation is required
Decision Tree Disadvantages
- Easy to over fit – especially as tree gets bigger
- “Greedy” algorithm – may result in local optima