# counting things

Sometimes things that are really easy to do in excel are not so intuitive in R. Like counting things. Because most of the time I am working with data in long format, you can end up with hundreds of observations, so functions like length() aren’t useful. Today I just wanted to check how many participants were in this dataset and it took me some significant googling.

``````library(tidyverse)
library(ggbeeswarm)
library(janitor)``````
##### create a little df
``````df <- data.frame("pp_no" = 1:16,
"delay" = c("short","long"), "condition" = c("easy", "easy", "difficult", "difficult"),
"score" = c(82, 75, 76, 72, 86, 89, 85, 87, 87, 76, 78, 85, 97, 87, 94, 87))``````

### count distinct values

My intuition is to use the `distinct()` function from dplyr, but it SELECTS distinct rows, but doesn’t count them.

It is the `n_distinct()` function will give you a count of the distinct values in a variable

``n_distinct(df\$pp_no)``
``##  16``

### counting by levels

The other counting thing I do a lot if count by group (or other categorical variable). Although there is a few lines of code, combining `group_by()` and `summarise()` is useful because you create a df that can combines both the count and other summary stats.

#### option 1: group_by x summarise

``````df %>%
group_by(delay) %>%
summarise(count = n(), mean_score = mean(score))``````
``````## # A tibble: 2 x 3
##   delay count mean_score
##   <fct> <int>      <dbl>
## 1 long      8       82.2
## 2 short     8       85.6``````

#### option 2: table()

If you just want a fast count, `table()` by categorical variable will count observations by condition

``table(df\$delay)``
``````##
##  long short
##     8     8``````

#### option 3: janitor::tabyl

When things are less evenly distributed `janitor::tabyl()` is useful because it gives % as well as n

``janitor::tabyl(df\$delay)``
``````##  df\$delay n percent
##      long 8     0.5
##     short 8     0.5``````