I don’t love cats. I am not a member of @RCatLadies. So the fact that Tidyverse packages for dealing with factors and functional programming have cat-related names (forcats
and purrr
) does not endear them to me.
I knew there was a time when I would encounter an R problem that needed the power of for loops, so when I asked the Twittersphere whether there was an alternative (hopefully a tidyverse one) that would allow me to avoid loops a little longer, I was a bit disappointed to hear that it is the purrr
package.
I think the time has come, I might not be able to avoid #forloops anymore. I want to run the same set of #dplyr operations (rename, select, filter etc) on a set of dataframes I have datapastaed into #rstats. Is there a way to say "do this" to files P001-P032 without using a loop?
— Dr Jenny Richmond (@JenRichmondPhD) September 4, 2018
Apparently the map()
function in purrr
is going to be my friend, but at the moment it is completely bewildering. I thought it might be helpful to paste all the links that the lovely rstats people on Twitter sent me so that I can keep them in one place.
If you want to learn purrr
this is what the Twitter experts suggest.
Alison’s fave purrr
learning resources are all from #rladies. I need to check out resources from
Jennifer Thompson https://github.com/jenniferthompson/RLadiesIntroToPurrr
Jenny Bryan https://jennybc.github.io/purrr-tutorial/
Charlotte Wickham https://github.com/cwickham/purrr-tutorial
also see: http://rstd.io/row-work
Jared Wilber suggests that Amber Thomas resources are good https://amber.rbind.io/blog/2018/03/26/purrr_iterations/
Tom Kelly pointed me towards the @swcarpentry resources
https://swcarpentry.github.io/r-novice-inflammation/03-loops-R/index.html https://swcarpentry.github.io/r-novice-inflammation/15-supp-loops-in-depth/
You can use dplyr::bind_rows() instead of reduce(rbind()). BUT if you want them all in one frame at the end you probably just want purrr::map_dfr(), which is a map and bind combo function. So many options! that’s actually half the problem with going #noloops. My most commonly used fns in purrr are map(), pmap(), walk(), iwalk() and every() maybe that helps narrow it down a bit.
This feels like an ideal purrr::map() use case. E.g., assuming .csv files:
purrr::map(filepaths, function(x) {
readr::read_csv(x) %>%
rename(...) %>%
select(...) %>%
filter(...) %>%
etc.
})
I use similar patterns a lot! You can:
map_dfr(filepaths, read_csv) %>%
select(...)
map_dfr(filepaths, function(x) {
read_csv(x) %>%
select(...)
})
bigdataframe %>%
split(list(bigdataframe$group1)) %>%
map(function(df) {
select(...)
})
A little Google searching turned out this post from Claus Wilke that outlines how to use map() to run read_csv for many files
https://serialmentor.com/blog/2016/6/13/reading-and-combining-many-tidy-data-files-in-R
Stay tuned for more purrr related posts…