Tuesday, August 19, 2014

dplyr: A gamechanger for data manipulation in R

I demonstrate how to use dplyr for data manipulation in R (R code and data on GitHub ). I had heard of the package before and finally gave it a try after attending Hadley Wickham's presentation at useR! in LA a couple of months ago. dplyr will change your life as it relates to data manipulation!

13 comments:

  1. Nice job! I didn't catch if you used the minus sign on a numeric variable or not, but it might only work on numeric. The desc() function would arrange the variable in descending order regardless of variable type (ascending is the default).

    Your microphone is picking up thumping sounds from your typing. A cheap solution to that is to place a thin layer of packing foam underneath your keyboard. If your microphone is on a stand, you can put it under that too.

    Keep up the good work!

    ReplyDelete
  2. Great video! I have been using sqldf to do this type of data manipulation but maybe I should switch to dplyr.

    ReplyDelete
    Replies
    1. Thanks, Paul! Yeah, this is way better than sqldf. How could I forget to mention the joins? Check out ?left_join and you'll never need to use merge() or the sqldf package again.

      Delete
  3. Great overview of a fantastic package. Love dplyr. And your succinct and super-short explanation belies just what a game changer this is. thanks. well done.

    ReplyDelete
  4. That was great. Appreciate the effort, nice and concise.

    ReplyDelete
  5. Thanks, very nice video.

    I didn't see how, in your example, using dplyr helped with ggplot. However, you could have used the "pipe" operator %>% instead of + (but I'm not sure why it would be "better").

    ReplyDelete
    Replies
    1. I was trying to make the ggplot work without dplyr, but I wasn't able to do so. One of the main reasons I prefer dplyr is to have the flexibility of storing the summary data in a data frame - after the object has been created, you could make a table, a bar graph, or a line graph to present the data minimal extra work.

      Delete
  6. Thanks for creating this video and posting it, Wallace. I saw this and Kevin Markham's post on dplyr. http://www.dataschool.io/dplyr-tutorial-for-faster-data-manipulation-in-r/.

    ReplyDelete
    Replies
    1. Yeah, dplyr has been super hot in the R blogosphere lately! I haven't watched that yet but I'll check it out - looks very in depth.

      Delete
  7. This comment has been removed by the author.

    ReplyDelete
  8. Great blog and article! If you want to know more or learn about data integration techniques and methods please contact us from the links below

    The article is meant to help informatica interview questions and answers for experienced
    individuals or students preparing on this particular topic. There are so many new important informatica
    scenario based questions
    points, question covered and different new points all covered in
    this piece all at ease. The best informatica interview questions thing about the article is
    that it makes studying and preparation quite simple for individuals and accordingly they can
    prepare for the informatica questions
    interview.

    ReplyDelete