I don’t like cats much

I don’t love cats. I am not a member of @RCatLadies. So the fact that Tidyverse packages for dealing with factors and functional programming have cat-related names (forcats and purrr) does not endear them to me.

I knew there was a time when I would encounter an R problem that needed the power of for loops, so when I asked the Twittersphere whether there was an alternative (hopefully a tidyverse one) that would allow me to avoid loops a little longer, I was a bit disappointed to hear that it is the purrr package.

I think the time has come, I might not be able to avoid #forloops anymore. I want to run the same set of #dplyr operations (rename, select, filter etc) on a set of dataframes I have datapastaed into #rstats. Is there a way to say "do this" to files P001-P032 without using a loop?
— Dr Jenny Richmond (@JenRichmondPhD) September 4, 2018

Apparently the map() function in purrr is going to be my friend, but at the moment it is completely bewildering. I thought it might be helpful to paste all the links that the lovely rstats people on Twitter sent me so that I can keep them in one place.

If you want to learn purrr this is what the Twitter experts suggest.

Tips from Alison Hill

Alison’s fave purrr learning resources are all from #rladies. I need to check out resources from

Jennifer Thompson https://github.com/jenniferthompson/RLadiesIntroToPurrr

Jenny Bryan https://jennybc.github.io/purrr-tutorial/

Charlotte Wickham https://github.com/cwickham/purrr-tutorial

also see: http://rstd.io/row-work

Tips from Jared Wilber

Jared Wilber suggests that Amber Thomas resources are good https://amber.rbind.io/blog/2018/03/26/purrr_iterations/

Tips from Tom Kelly

Tom Kelly pointed me towards the @swcarpentry resources

https://swcarpentry.github.io/r-novice-inflammation/03-loops-R/index.html https://swcarpentry.github.io/r-novice-inflammation/15-supp-loops-in-depth/

Miles McBain says…

You can use dplyr::bind_rows() instead of reduce(rbind()). BUT if you want them all in one frame at the end you probably just want purrr::map_dfr(), which is a map and bind combo function. So many options! that’s actually half the problem with going #noloops. My most commonly used fns in purrr are map(), pmap(), walk(), iwalk() and every() maybe that helps narrow it down a bit.

Hendrik vanB says…

This feels like an ideal purrr::map() use case. E.g., assuming .csv files:

purrr::map(filepaths, function(x) {
  readr::read_csv(x) %>%
    rename(...) %>%
    select(...)  %>%
    filter(...) %>%
    etc.
})

James Goldie says…

I use similar patterns a lot! You can:

map over the filename vector, immediately joining the dfs into 1 df using map_dfr and then doing your operations on the result

map_dfr(filepaths, read_csv) %>%
  select(...)

map over the filename vector, doing your operations on each one inside map and then joining the result

map_dfr(filepaths, function(x) {
  read_csv(x) %>%
  select(...)
})

split a dataframe up into a list of dfs and map over that

bigdataframe %>%
  split(list(bigdataframe$group1)) %>%
  map(function(df) {
    select(...)
  })

my own googling

A little Google searching turned out this post from Claus Wilke that outlines how to use map() to run read_csv for many files

https://serialmentor.com/blog/2016/6/13/reading-and-combining-many-tidy-data-files-in-R

Stay tuned for more purrr related posts…

R notes to myself

R notes to myself

I don’t like cats much

Tips from Alison Hill

Tips from Jared Wilber

Tips from Tom Kelly

Miles McBain says…

Hendrik vanB says…

James Goldie says…

my own googling

R notes to myself

R notes to myself

I don’t like cats much

Useful purrr related blog posts/resources

Tips from Alison Hill

Tips from Jared Wilber

Tips from Tom Kelly

Miles McBain says…

Hendrik vanB says…

James Goldie says…

my own googling