Tuesday, August 19, 2014

dplyr: A gamechanger for data manipulation in R

I demonstrate how to use dplyr for data manipulation in R (R code and data on GitHub ). I had heard of the package before and finally gave it a try after attending Hadley Wickham's presentation at useR! in LA a couple of months ago. dplyr will change your life as it relates to data manipulation!

GBMs are awesome: Part I

GBMs have become my favorite type of model over the last two years. In this tutorial, I demonstrate how to use a GBM for binary classification in R (predicting whether an event occurs or not). I also discuss basic model tuning and model inference with GBMs. Stay tuned for a second part focused on tuning parameters, variable selection, and cross validation with GBM!



Using a GBM for Classification in R from Wallace Campbell on Vimeo.

Saturday, December 21, 2013

Analysis of experimental data with R

It's been a while since my last video, with good reason: I started a full time job as a senior statistician about 2 months ago! Not that I spend my entire day coding in R, but I wouldn't be nearly as useful if I didn't know how to use it.

Anyway, while in grad school, I helped my friends with data analysis for thesis and dissertation projects. In return, they brought me cookies and beer. This video explains some of the most common tasks that are necessary in the analysis of experimental data.

R code and data available on GitHub .

Analysis of experimental data with R from Wallace Campbell on Vimeo.

Wednesday, September 25, 2013

K-Fold Cross validation: Random Forest vs GBM

K-Fold Cross validation: Random Forest vs GBM from Wallace Campbell on Vimeo.

In this video, I demonstrate how to use k-fold cross validation to obtain a reliable estimate of a model's out of sample predictive accuracy as well as compare two different types of models (a Random Forest and a GBM). I use data Kaggle's Amazon competition as an example.

Tuesday, August 27, 2013

Multicore (parallel) processing in R

Multicore (parallel) processing in R from Wallace Campbell on Vimeo.

If you're not programming in parallel, you're only using a fraction of your computer's power! I demonstrate how to run "for" loops in parallel using the mclapply function from the multicore library. The code can be scaled to any number of available cores.

Monday, August 26, 2013

Survival analysis in R: Weibull and Cox proportional hazards models

I describe how to estimate the Weibull accelerated failure time model and the Cox proportional hazards model, test the assumptions, make predictions, and plot survival functions using each model.



Survival analysis in R: Weibull and Cox proportional hazards models from Wallace Campbell on Vimeo.

Nonparametric (local polynomial) regression in R

Local polynomial regression models can be used as a more flexible alternative to linear regression. However, the nonparametric regression models are slightly more difficult to estimate and interpret than linear regression. This video explains almost everything you need to know about local polynomial models in R including choosing the bandwidth, estimating the model, plotting the regression, and estimating marginal effects. I use Wand and Ripley's KernSmooth package.