calculating age

I have been playing with a new (not actually new, but new to R) dataset this week.

Successes:

  1. I loaded the data in using read_csv. *** see ERRATUM below
  2. I fixed some weird data type assumptions.
  • The interview group is listed as 1s and 2s in the file so R thought those were integers. So I used as.factor to convert.
dataframe$variable <- as.factor(dataframe$variable) 


read_csv did weird things with dates, importing them as factors. People say that read_csv makes fewer weird assumptions than read.csv but I am not yet convinced. I used lubridate package to specify that DOB and Test_date were dates with Day Month Year (dmy) format. It was pretty clever and dealt nicely with some inconsistencies in date format too. **** see ERRATUM


mutate(DOB = dmy(DOB), Test_Date= dmy(Test_Date))
 

ERRATUM

My bad… the dates as factors issue was caused by read.csv NOT read_csv. When I posted this to twitter, Alison Hill helpfully pointed out that there is no way to read in a factor by default in readr. So this is an example of how read_csv does a “better” job than read.csv. Better in the sense that is automatically imports everything that isn’t a number as a character rather than assuming things are factors. I still need to use lubridate to convert character to dates (DOB Test_Date) and integers to factors (InterviewGroup) though.


Challenges

Now that I have dates that are dates, I’d like to calculate how old these kids are. Just subtracting DOB from Test_Date does OK, except that it calculates age in Days.

mutate(agedays = (Test_Date-DOB))    

I mostly work with infants and preschoolers so I really want age in Weeks or Months, not Days or Years. Turns out this is a bit challenging. I could just estimate age in months by dividing the Days by 30, which works, kind of. The result is in time format but still in days.

Help google!!

This function on github has potential.

https://gist.github.com/mmparker/7254445

It does a really nice job of calculating age even when I use my date variables.

mutate(ageyear = calc_age(DOB, Test_Date))

BUT…. the output is years.

Trying to work out how to adapt this function for my purpose sent me down a little rabbit hole re what is a function and how you make one.

This one is pretty simple, a good introduction, me thinks…

Function to calculate age:


calc_age <- function(birthDate, refDate = Sys.Date()) {

    require(lubridate)

    period <- as.period(interval(birthDate, refDate),
                        unit = "year")

    period$year

} 


My understanding is that this piece of code creates a function called calc_age that take inputs of birthDate and refDate, although if you skip the refDate input it will default to Sys.Date, which would be useful if you wanted to know what age was today. Using my birthday as input…

calc_age("1978/06/30")

…gives answer = 40

Yup, I’m still 40.

The nuts and bolts are between the curly brackets. This function requires the lubridate package, and defines the “period” as an interval between birthDate and refDate, specifying that it is a period (rather than an interval or duration)


NOTE

lubridate object classes
1. durations, which represent an exact number of seconds.
2. periods, which represent human units like weeks and months.
3. intervals, which represent a starting and ending point.
(from http://r4ds.had.co.nz/dates-and-times.html#time-spans)


lubridate is smart with dates right? I should be able to just change the unit = months and it would work? Sadly, no.

I need help…

UPDATE

Twitter saves the day again! BIG thanks to Alison Hill for pointing me to this extra useful lubridate resource

So what I really want to do is calculate an interval, which you do using %–%. Once you have an interval, you divide by months(1), using / if you want decimal places or modulus [%/% -I always wondered what that was] if you want whole months.

ageproblem2 <- ageproblem %>%
        mutate(ageinterval = DOB %--% Test_Date) %>%
        mutate(agemons = ageinterval %/% months(1)) %>%
         mutate(ageweeks = ageinterval %/% weeks(1))