Saturday, December 21, 2013

Analysis of experimental data with R

It's been a while since my last video, with good reason: I started a full time job as a senior statistician about 2 months ago! Not that I spend my entire day coding in R, but I wouldn't be nearly as useful if I didn't know how to use it.

Anyway, while in grad school, I helped my friends with data analysis for thesis and dissertation projects. In return, they brought me cookies and beer. This video explains some of the most common tasks that are necessary in the analysis of experimental data.

R code and data available on GitHub .

Analysis of experimental data with R from Wallace Campbell on Vimeo.

Wednesday, September 25, 2013

K-Fold Cross validation: Random Forest vs GBM

K-Fold Cross validation: Random Forest vs GBM from Wallace Campbell on Vimeo.

In this video, I demonstrate how to use k-fold cross validation to obtain a reliable estimate of a model's out of sample predictive accuracy as well as compare two different types of models (a Random Forest and a GBM). I use data Kaggle's Amazon competition as an example.

Tuesday, August 27, 2013

Multicore (parallel) processing in R

Multicore (parallel) processing in R from Wallace Campbell on Vimeo.

If you're not programming in parallel, you're only using a fraction of your computer's power! I demonstrate how to run "for" loops in parallel using the mclapply function from the multicore library. The code can be scaled to any number of available cores.

Monday, August 26, 2013

Survival analysis in R: Weibull and Cox proportional hazards models

I describe how to estimate the Weibull accelerated failure time model and the Cox proportional hazards model, test the assumptions, make predictions, and plot survival functions using each model.



Survival analysis in R: Weibull and Cox proportional hazards models from Wallace Campbell on Vimeo.

Using a GBM for Classification in R

I discuss some advantages of Generalized Boosted Models over logistic regression and discriminant analysis and demonstrate how to use a GBM for binary classification (predicting whether an event occurs or not).



Using a GBM for Classification in R from Wallace Campbell on Vimeo.

Nonparametric (local polynomial) regression in R

Local polynomial regression models can be used as a more flexible alternative to linear regression. However, the nonparametric regression models are slightly more difficult to estimate and interpret than linear regression. This video explains almost everything you need to know about local polynomial models in R including choosing the bandwidth, estimating the model, plotting the regression, and estimating marginal effects. I use Wand and Ripley's KernSmooth package.



Estimation, prediction, and evaluation of logistic regression models

I provide an introduction to using logistic regression for prediction (binary classification) using the Titanic data competition from www.Kaggle.com as an example. I use models to predict in missing data, estimate a logistic regression model on a training data set, and use the estimated model to predict survival on a test data set. The video covers just about everything you need to know to estimate, predict, and evaluate logistic regression models in R.