Thanks to Tom Mock for saving me from this blogpost on twitter. I was monster number 6… beginning to convince myself that there were gremlins in the `across()`

function.

In fact, I had forgotten the golden rule:

don’t make objects that have the same names a functions

That is for another post…

I have been playing with the new `across()`

function in `dplyr`

which is a magical solution when you are looking to get several summary statistics across a range of variables. BUT…. working out how to get those summary stats nicely formatted in a table is a bit of a challenge.

```
library(tidyverse)
library(palmerpenguins)
library(skimr)
library(tableone)
library(gt)
```

Here I am using data from the palmer penguins package.

`penguins <- penguins`

```
df <- tibble(penguin_bits = c("bill", "flipper"),
mean = c(43.92, 200.92),
sd = c(5.46, 14.06),
min = c(32.1, 172),
max = c(59.6, 231))
df %>%
gt()
```

penguin_bits | mean | sd | min | max |
---|---|---|---|---|

bill | 43.92 | 5.46 | 32.1 | 59.6 |

flipper | 200.92 | 14.06 | 172.0 | 231.0 |

I want to get summary stats for bill and flipper length (mean, sd, min, and max) and have them display in a nice table like this. But this manual dataframe construction is a bit silly and not at all reproducable.

Is there an easy way to get summary stats in this format so they are compatible with the gt package?

```
penguins %>%
summarise(bill_mean = mean(bill_length_mm, na.rm = TRUE),
bill_sd = sd(bill_length_mm, na.rm = TRUE),
bill_min = min(bill_length_mm, na.rm = TRUE),
bill_max = max(bill_length_mm, na.rm = TRUE),
flipper_mean = mean(flipper_length_mm, na.rm = TRUE),
flipper_sd = sd(flipper_length_mm, na.rm = TRUE),
flipper_min = min(flipper_length_mm, na.rm = TRUE),
flipper_max = max(flipper_length_mm, na.rm = TRUE)) %>%
gt()
```

bill_mean | bill_sd | bill_min | bill_max | flipper_mean | flipper_sd | flipper_min | flipper_max |
---|---|---|---|---|---|---|---|

43.92193 | 5.459584 | 32.1 | 59.6 | 200.9152 | 14.06171 | 172 | 231 |

The summarise function spits out summary stats in a SUPER wide format.

`across()`

The new `across()`

function gets you the same thing in many fewer lines of code, but still, it is VERY wide.

```
penguins %>%
summarise(across(contains("length"),
list(mean = mean, sd = sd, min = min, max = max), na.rm = TRUE)) %>%
gt()
```

bill_length_mm_mean | bill_length_mm_sd | bill_length_mm_min | bill_length_mm_max | flipper_length_mm_mean | flipper_length_mm_sd | flipper_length_mm_min | flipper_length_mm_max |
---|---|---|---|---|---|---|---|

43.92193 | 5.459584 | 32.1 | 59.6 | 200.9152 | 14.06171 | 172 | 231 |

The skimr package is nice because it is tidyverse compatible (i.e. you can select with contains) AND the output can be a dataframe, which you can then edit.

```
skimtable <- penguins %>%
select(contains("length")) %>%
skim()
skimtable
```

Name | Piped data |

Number of rows | 344 |

Number of columns | 2 |

_______________________ | |

Column type frequency: | |

numeric | 2 |

________________________ | |

Group variables | None |

**Variable type: numeric**

skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|

bill_length_mm | 2 | 0.99 | 43.92 | 5.46 | 32.1 | 39.23 | 44.45 | 48.5 | 59.6 | ▃▇▇▆▁ |

flipper_length_mm | 2 | 0.99 | 200.92 | 14.06 | 172.0 | 190.00 | 197.00 | 213.0 | 231.0 | ▂▇▃▅▂ |

For my purpose it is more than I need though. There is some selecting and renaming to do after the fact.

```
skimtable_renamed <- skimtable %>%
select(skim_variable, numeric.mean, numeric.sd, numeric.p0, numeric.p100) %>%
rename(penguin_bits = skim_variable, mean = numeric.mean, sd = numeric.sd, min = numeric.p0, max = numeric.p100)
```

You do end up with a gt compatible dataframe.

```
skimtable_renamed %>%
gt()
```

penguin_bits | mean | sd | min | max |
---|---|---|---|---|

bill_length_mm | 43.92193 | 5.459584 | 32.1 | 59.6 |

flipper_length_mm | 200.91520 | 14.061714 | 172.0 | 231.0 |

The TableOne package gives you mean and SD, is there a way to add other summary stats (like min & max to TableOne?) And make the formatted nicer?

```
variables <- c("bill_length_mm", "flipper_length_mm")
CreateTableOne(vars = variables, data = penguins)
```

```
##
## Overall
## n 344
## bill_length_mm (mean (SD)) 43.92 (5.46)
## flipper_length_mm (mean (SD)) 200.92 (14.06)
```

Ideally, I want to be able to use `summarise()`

and `across()`

and somehow make the wide output long. The problem is that `pivot_longer()`

will take more than 1 “names_to” argument, but not more than a single “values_to” argument.

I would like to be able to pivot wide summary stats long like this…

```
penguins %>%
summarise(across(contains("length"),
list(mean = mean, sd = sd, min = min, max = max), na.rm = TRUE)) %>%
pivot_longer(names_to = "penguin_bits", values_to = c("mean", "sd", "max", "min"), 1:8, values_sep = "_")
```

… but not sure if that is possible.