How to use R to analyse your bank transactions

There is no doubt that COVID19 has massively disrupted how we spend our time and money. In this post, I update code inspired by this R-bloggers blog to look at how much less money we have been spending on eating in the last month, compared to April 2019.

Download a copy of your bank transactions in csv format.

Load packages


Read in data

Clean and rename, select just date, description, debit amount, add new column for class with default value = “Other”

items <- read_csv("data_new.csv") %>%
  clean_names() %>%
  rename(desc = narrative, 
         debit = debit_amount, 
         credit = credit_amount) %>%
  select(date, desc, debit) %>%
  mutate(class = "Other")

Build regular expression strings for categories

Mostly just interested in food changes between April 2019 and April 2020. Scan transaction desc for phrases that relate to food and transport.

# Build simple regexp strings




Use grepl to assign class value to each transaction

grepl is a weird thing. I think this checks if the value in items$desc matches something in the regexp defined for groceries (for example) and if it does puts Groceries in the items$class column

#assign values to class column based on regexp
items$class <-  
  ifelse(grepl(coffeelunch, items$desc),"CoffeeLunch",
      ifelse(grepl(dinnerout, items$desc),"Dinnerout",
                   ifelse(grepl(groceries, items$desc),"Groceries",

Remove NAs and fix dates

list_items <- na.omit(items) 

list_items$date <- dmy(list_items$date)

list_items <- list_items %>%
  mutate(year = year(date), month = month(date), day = day(date))

Filter just April

Filter data for just April 2019 and 2020, make class and year factor.

april_list_items <- list_items %>%
  filter(month == 4)

april_list_items$class <- as.factor(april_list_items$class)
april_list_items$year <- as.factor(april_list_items$year)

Plot food spending

april_list_items %>%
  filter(class %in% c("Groceries", "Dinnerout", "CoffeeLunch")) %>%
  group_by(year, class) %>%
  summarise(monthlytotal = sum(debit)) %>%
  ggplot(aes(x = year, y = monthlytotal, fill = year)) +
  geom_col() +
  scale_fill_manual(values = c("#0072B2","#CC79A7")) +
  facet_wrap(~class) +
  labs(title = "April eating") +
  ylab("Monthly Spend") +
  axis.text.y = element_blank(),
  axis.ticks = element_blank()) # removes y axis tick labels

We spend WAY too much on lunch/coffee at work every day!