The algorithm behind building trees consists of two basic steps
Divide the predictors
For making a prediction in the region
With the above definition, the loss becomes
To overcome this, we build the trees in a greedy top down approach called recursive binary splitting. This method aims to find the best possible split at each level in a greedy manner.
To start off, the alogithm will try to split the data into two regions
The same algorithm is run recursively. The only change made after the first split is that we now work independently on two separate regions. For each region, we will not be able to consider all the possible values of
The process is stopped when a pre-determined stopping criteria is reached, such as minimum node size or depth (typically the tree is grown to a large depth and then pruned). For making a prediction, we will first determine the region in which the test observation falls and then make the prediction based on the mean of the training samples in this region (which has already been calculated while building the tree).
At the first split,
The tree build using above strategy can have good results on the training set, but will yield poor performance on the test set due to the high complexity. A better strategy is to build short trees which are less complex and produce lower variance at cost of little bias and are more interpretable.
A simple strategy can be to build only when there is a decrease in RSS above a threshold. This approach can be short-sighted as some future good branches can be missed.
Tree pruning is to build a large tree and then shrink it back to obtain a subtree. The RSS of a subtree can be found via cross validation. This approach is expensive for all possible subtrees.
Cost complexity Pruning or weakest link pruning is an approach to overcome this problem. For a non-negative parameter
The same formula is valid for any of the internal nodes/subtrees also. Hence, for a given
The algorithm can then be summarized as follows
Use binary recursive splitting to obtain a large tree on the data set. Stop only when at a terminal node, the number of observations is less than a minimum.
Apply cost complexity pruning in order to obtain a sequence small trees
Use K-fold cross validation to choose
Repeat steps 1 and 2 excluding the data from the
Evaluate the mean predicted validation error as a function of
Choose the value of
Select the substree