0

I have such type of data:

Date           Status  ID
23-1-2010 11:40 in  321
23-1-2010 11:53 out 321
9-1-2010 12:11  in  356
9-1-2010 12:18  out 356
23-1-2010 11:37 in 356
23-1-2010 11:5  out 356
5-2-2010 13:14  in  398
5-2-2010 13:30  out 398
10-3-2010 9:30  in  398
13-3-2010 11:50 out 377
16-3-2010 10:30 in  377
16-3-2010 11:00 out 377
20-3-2010 12:09 in  377
20-3-2010 12:30 out 377

The data describes customers who visited a supermarket in a certain date and time. The customers are identified by their ID and their status is also specified.

I want to calculate the time a customer spent in the supermarket on different days. The problem I have with the data is for some customers only the entrance time or exit time is recorded. I have cleared the customers who visited once and either in or out status is missing but I still have some of them who visited more than once and the in/out is missing.

I have tried this

#create an empty data frame
TimeSpent<-rep(NA,length(df$ID))
ID<-rep(NA,length(df$ID))
Tspent<-data.frame(TimeSpent,ID)



#compute the time spent time
for(i in 1:length(df$Date - 1))
  {
      if(isTRUE(df$Status[i] == "in" && df$Status[i+1] == "out"))
      {
        Tspent$ID[i] <- df$ID[i]
        Tspent$TimeSpent[i] <- difftime(df$Date[i+1] - df$Date[i])
      } else if(isTRUE(df$Status[i+1] == "in" && df$Status[i+2] == "out"))
      {
        Tspent$ID[i] <- df$ID[i+1]
        Tspent$TimeSpent[i] <- difftime(df$Date[i+2] - df$Date[i+1])
      }  else 
        {
        Tspent$ID[i] <- df$ID[i+2]
        Tspent$TimeSpent[i] <- difftime(df$Date[i+3] - df$Date[i+2])
      }

      i<-i+1
}

and I get this error: Error in as.POSIXct.default(time1) : do not know how to convert 'time1' to class "POSIXct"

Does anyone knows how to correct my code or any alternative solution? Thanks in advance!

4

1 回答 1

2

我不知道您的 data.frame 的结构(尝试str(df)),但我猜您没有将日期转换为 POSIXct 对象。这样做是这样的:

 as.POSIXct(strptime(df$Date, format='%d-%m-%Y %H:%M'))

可能这可以解决您的问题。如果不是,请发布更多我可以读取的数据(当我尝试快速读取时,日期和时间之间的空白会给我一个错误)

编辑:

我以为我让你知道:问题出在difftime()功能上。您可以轻松地绕过并在没有它的情况下进行计算——它适用于我的示例数据。

我的样本数据:

    df <- data.frame(Date=(Sys.time()+ runif(20)*3600)) # already delvers timedate object
    df <- data.frame(df[order(df),1])
    df$status <- rep(c('in', 'out'), each=(10))
    df$ID     <- rep(c(1:10), each=2)
    names(df)[1] <- 'Date'

您稍微修改过的代码

 #create an empty data frame
 TimeSpent<-rep(NA,length(df$ID))
 ID<-rep(NA,length(df$ID))
 Tspent<-data.frame(TimeSpent,ID)



 #compute the time spent time
 for(i in 1:length(df$Date - 1))
   {
       if(isTRUE(df$Status[i] == "in" && df$Status[i+1] == "out"))
       {
         Tspent$ID[i] <- df$ID[i]
         Tspent$TimeSpent[i] <- df$Date[i+1] - df$Date[i]
       } else if(isTRUE(df$Status[i+1] == "in" && df$Status[i+2] == "out"))
       {
         Tspent$ID[i] <- df$ID[i+1]
         Tspent$TimeSpent[i] <- df$Date[i+2] - df$Date[i+1] ** just skipped the difftime function
       }  else 
         {
         Tspent$ID[i] <- df$ID[i+2]
         Tspent$TimeSpent[i] <- df$Date[i+3] - df$Date[i+2]
       }

       i<-i+1
 }

输出

    TimeSpent ID
 1   8.266451  2
 2   4.044099  2
 3  12.895463  3
 4   2.699761  3
 5   1.484544  4
于 2012-01-18T18:37:08.123 回答