Sometimes things that are really easy to do in excel are not so intuitive in R. Like counting things. Because most of the time I am working with data in long format, you can end up with hundreds of observations, so functions like length() aren’t useful. Today I just wanted to check how many participants were in this dataset and it took me some significant googling.
library(tidyverse) library(ggbeeswarm) library(janitor)
df <- data.frame("pp_no" = 1:16, "delay" = c("short","long"), "condition" = c("easy", "easy", "difficult", "difficult"), "score" = c(82, 75, 76, 72, 86, 89, 85, 87, 87, 76, 78, 85, 97, 87, 94, 87))
My intuition is to use the
distinct() function from dplyr, but it SELECTS distinct rows, but doesn’t count them.
It is the
n_distinct() function will give you a count of the distinct values in a variable
##  16
The other counting thing I do a lot if count by group (or other categorical variable). Although there is a few lines of code, combining
summarise() is useful because you create a df that can combines both the count and other summary stats.
df %>% group_by(delay) %>% summarise(count = n(), mean_score = mean(score))
## # A tibble: 2 x 3 ## delay count mean_score ## <fct> <int> <dbl> ## 1 long 8 82.2 ## 2 short 8 85.6
If you just want a fast count,
table() by categorical variable will count observations by condition
## ## long short ## 8 8
When things are less evenly distributed
janitor::tabyl() is useful because it gives % as well as n
## df$delay n percent ## long 8 0.5 ## short 8 0.5