Tuesday, August 19, 2014

GBMs are awesome: Part I

GBMs have become my favorite type of model over the last two years. In this tutorial, I demonstrate how to use a GBM for binary classification in R (predicting whether an event occurs or not). I also discuss basic model tuning and model inference with GBMs. Stay tuned for a second part focused on tuning parameters, variable selection, and cross validation with GBM!

Using a GBM for Classification in R from Wallace Campbell on Vimeo.


  1. Very nice video...as I'm more involved with financial markets I've often see the dependent variable (ie. returns) being transformed into categorical ones like "up, down, sideways" (not binary) before feeding it into the decision tree algorithm.
    Is there any strong reason for not using the untrasformed continuous variable other than avoiding overfitting?
    What would be the proper "distribution" parameter in the gbm.fit function for such categorical variables? Bernoulli doesn't look to be suitable anymore as we are not dealing with binary outcomes...

    1. Interesting transformation! Another option would be to use the continuous dependent variable (returns) and then group the predictions into categories: predictions from -0.1% to 0.1% are grouped as sideways, etc.

      There is a "multinomial" option for the distribution argument in gbm. Check out the distribution part of ?gbm. Let me know how it works out for you!

  2. Super post .. waiting for Part2 !!!