I use read_csv to read data into R and it most conservatively assumes that when you have text in a variable you are dealing characters (not factors). Of course these things are often factors so you need to explicitly convert them if you want to use the factor in an analysis or have it appear the way you want in a ggplot.
The forcats package will do this with as_factor
library(tidyverse)
library(ggbeeswarm)
library(janitor)
df <- data.frame("pp_no" = 1:16,
"delay" = c("short","long"), "condition" = c("easy", "easy", "difficult", "difficult"),
"score" = c(82, 75, 76, 72, 86, 89, 85, 87, 87, 76, 78, 85, 97, 87, 94, 87))
df$delay <- as_factor(df$delay)
df$condition <- as_factor(df$condition)
#check variable types with glimpse
glimpse(df)
## Rows: 16
## Columns: 4
## $ pp_no <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
## $ delay <fct> short, long, short, long, short, long, short, long, short, l…
## $ condition <fct> easy, easy, difficult, difficult, easy, easy, difficult, dif…
## $ score <dbl> 82, 75, 76, 72, 86, 89, 85, 87, 87, 76, 78, 85, 97, 87, 94, …
df %>%
ggplot(aes(x = delay, y = score)) +
geom_quasirandom() +
facet_wrap(~ condition) +
ylim(50,100) +
theme_classic()
Ideally I would like ggplot to order group (short, long) and condition (easy, difficult), but at the moment this is the opposite of what I want because the default factor ordering is alphbetical. Check factor levels with levels()
levels(df$delay)
## [1] "short" "long"
levels(df$condition)
## [1] "easy" "difficult"
You can reorder factors by other factors and all kinds of other fancy things using the forcats
package, vignette here, but most of the time I want to do it manually.
The fct_relevel()
function is useful.
REMEMBER: to manually reorder factors the function is called fct_relevel(), NOT fct_reorder()– gets me everytime
df$delay <- fct_relevel(df$delay, c("short", "long"))
df$condition <- fct_relevel(df$condition, c("easy", "difficult"))
Check levels again to make sure it has done what you want.
levels(df$delay)
## [1] "short" "long"
levels(df$condition)
## [1] "easy" "difficult"
df %>%
ggplot(aes(x = delay, y = score)) +
geom_quasirandom() +
facet_wrap(~ condition) +
ylim(50,100) +
theme_classic()