I am new to R, but after taking an intro course and playing with it a bit, I'm hopeful that it can 1) more elegantly solve my modelling objectives (compared to Excel, which is my backup plan) and 2) be a useful skill to take away from this project.
The task/objective:
I am attempting to use driving diary data to simulate and model potential energy and GHG emissions from electric cars. Specifically:
- I have driving diary data (start and end time stamps, plus other data of thousands of drivers -- basic sample below) that I want to translate into:
- 24-hour time series data, such that for each minute of a 24-hour period, I know exactly who is driving a vehicle, and what 'trip' that it belongs to (for that driver). My problem here focuses on this issue.
The type of output I would like: NOTE: this output is NOT related to the sample data provided below. I used the first ten minutes of a certain day with some theoretical trips just as an example
Not essential to this problem, but may be useful to know: I will use the above output to cross-reference other driver-specific data to calculate minute-by-minute consumption of gasoline (or electricity) based on things associated with that trip, such as parking location or trip distance. I would like to do this in R but must first figure out the above problem before I move onto this step.
The solution I have so far is based on:
- How to count the number of concurrent users using time interval data?
- How to calculate number of occurrences per minute for a large dataset
The problem:
Example simplified data:
a <- c("A","A","A","B","B","B","C","C","C")
b <- c(1, 2, 3, 1, 2, 3, 1, 2, 3)
c <- as.POSIXct(c(0.29167, 0.59375, 0.83333, 0.45833, 0.55347, 0.27083, 0.34375, 0.39236, 0.35417)*24*3600 + as.POSIXct("2013-1-1 00:00") )
d <- as.POSIXct(c(0.334027778, 0.614583333, 0.875, 0.461805556, 0.563888889, 0.295138889, 0.375, 0.503472222, 0.364583333)*24*3600 + as.POSIXct("2013-1-1 00:00"))
e <- c(2, 8, 2, 5, 5, 2, 5, 5, 2)
f <- as.POSIXct(c(0, 0.875, 0, 0.479166666666667, 0.580555555555556, 0.489583333333333, 0.430555555555556, 0.541666666666667, 0.711805555555555)*24*3600 + as.POSIXct("2013-1-1 00:00"))
g <- as.POSIXct(c(0, 0.885, 0, 0.482638888888889, 0.588194444444444, 0.496527777777778, 0.454861111111111, 0.559027777777778, 0.753472222222222)*24*3600 + as.POSIXct("2013-1-1 00:00"))
h <- c(0, 1, 0, 1, 4, 8, 8, 1, 5)
i <- as.POSIXct(c(0, 0, 0, 0.729166666666667, 0.595833333333333, 0.534722222222222, 0.59375, 0.779861111111111, 0.753472222222222)*24*3600 + as.POSIXct("2013-1-1 00:00"))
j <- as.POSIXct(c(0, 0, 0, 0.736111111111111, 0.605555555555556, 0.541666666666667, 0.611111111111111, 0.788194444444445, 0.75625)*24*3600 + as.POSIXct("2013-1-1 00:00"))
k <- c(0, 0, 0, 4, 4, 2, 5, 8,1)
testdata <- data.frame(a,b,c,d,e,f,g,h,i,j,k)
names(testdata) <- c("id", "Day", "trip1_start", "trip1_end", "trip1_purpose", "trip2_start", "trip2_end", "trip2_purpose", "trip3_start", "trip3_end", "trip3_purpose")
In this example data, I have three drivers (id = A, B, C) who each drive on three different days (day = 1, 2, 3). Note that some drivers may have different numbers of trips. The time stamps indicate start and end time of driving activities.
I then create minute intervals for a entire day (January 1, 2013)
start.min <- as.POSIXct("2013-01-01 00:00:00 PST")
end.max <- as.POSIXct("2013-01-01 23:59:59 PST")
tinterval <- seq.POSIXt(start.min, end.max, na.rm=T, by = "mins")
Insert "1" during minutes where a given user is driving:
out1 <- xts(,align.time(tinterval,60))
# loop over each user
for(i in 1:NROW(testdata)) {
# paste the start / end times into an xts-style range
timeRange <- paste(format(testdata[i,c("trip1_start","trip1_end")]),collapse="/")
# add the minute "by parameter" for timeBasedSeq
timeRange <- paste(timeRange,"M",sep="/")
# create the by-minute sequence and align to minutes to match "out"
timeSeq <- align.time(timeBasedSeq(timeRange),60)
# create xts object with "1" entries for times between start and end
temp1 <- xts(rep(1,length(timeSeq)),timeSeq)
# merge temp1 with out and fill non-matching timestamps with "0"
out1 <- merge(out1, temp1, fill=0)
}
# add column names
colnames(out1) <- paste(testdata[,1], testdata[,2], sep = ".")
The idea is to then repeat this for each trip, e.g. out2, out3, etc. wherein I would fill any driving periods with "2", "3", etc., and then sum/merge all of the resulting outx dataframes, and eventually get the desired result.
Unfortunately when I try to repeat this for out2...
out2 <- xts(,align.time(tinterval,60))
for(i in 1:NROW(testdata)) {
timeRange2 <- paste(format(testdata[i,c("trip2_start","trip2_end")]),collapse="/")
timeRange2 <- paste(timeRange2,"M",sep="/")
timeSeq2 <- align.time(timeBasedSeq(timeRange2),60)
temp2 <- xts(rep(2,length(timeSeq2)),timeSeq2)
out2 <- merge(out2, temp2, fill=0)
}
colnames(out2) <- paste(testdata[,1], testdata[,2], sep = ".")
head(out2)
I get the following errors:
- Error in UseMethod("align.time") : no applicable method for 'align.time' applied to an object of class "Date"
- Error in
colnames<-
(*tmp*
, value = c("A.1", "A.2", "A.3", "B.1", "B.2", : attempt to set 'colnames' on an object with less than two dimensions
What is wrong with my code for out2?
Are there any other better solutions or packages I can learn about?
I realize this is probably a very roundabout way to get to my desired output.
Any help would be much appreciated.