additive learners: sneakier split

the information the cluster gave us about the features X. With regression trees, what we want to do is maximize I[C;Y], where Y is now the dependent variable, and C are now is the variable saying which leaf of the tree we end up at. Once again, we can’t do a direct maximization, so we again do a greedy search. We

All the additive learners in boosting are modeled after the residual errors at each step. Intuitively, it could be observed that the boosting learners make use of the patterns in residual errors. At the stage where maximum accuracy is reached by boosting, the residuals appear to be randomly distributed without any pattern.

