more wrangling tips

It is definitely true that it takes much longer to get your data ready for analysis than it does to actually analyse it. Apparently up to 80% of the data analysis time is spent wrangling data (and cursing and swearing).

Did you know up to 80% of data analysis is spent on the process of cleaning and preparing data? - cf. Wickham, 2014 and Dasu and Johnson, 2003
So here is an excellent approach to data wrangling in #rstats https://t.co/tqogwNSSjN
— Miguel Á. Armengol (@miguearmengol) September 10, 2018

Here is another great wrangling resource, this time by Bradley Boehmke.

And if you need a rationale for why it is a good idea to acquire some wrangling skills, a quote by Jenny Bryan

“Classroom data are like teddy bears and real data are like a grizzly bear with salmon blood dripping out its mouth”

A few things I didn’t already know about `tidyr` and `dplyr`

In addition to gather() and spread(), the tidyr package can also be used to separate() i.e. pull parts of a single variable apart into separate columns and unite() i.e. combine several columns into one.
When using filter() from dplyr, specify group membership using %in%. Also distinct() will remove duplicate rows and slice(3:5) will subset by particular rows.
When using dplyr summarise(), sometimes you want to count the number of participants but n() will give you the number of observations. There is an n_distinct() function that might be useful in counting the number of participants.

R notes to myself

R notes to myself

more wrangling tips

A few things I didn’t already know about `tidyr` and `dplyr`

R notes to myself

R notes to myself

more wrangling tips

A few things I didn’t already know about tidyr and dplyr

A few things I didn’t already know about `tidyr` and `dplyr`