这是一种方法:
读入您的数据并将您的日期转换为实际的日期变量:
mydf <- read.table(header = TRUE, stringsAsFactors=FALSE,
text = "Store Min(Date) Max(Date) Status
NYC1 1/1/2013 2/1/2013 Open
NYC1 2/2/2013 2/3/2013 'Closed for Inspection'
Boston1 1/1/2013 2/5/2013 Open")
names(mydf) <- c("store", "min.date", "max.date", "status")
mydf$min.date <- as.Date(mydf$min.date, format = "%m/%d/%Y")
mydf$max.date <- as.Date(mydf$max.date, format = "%m/%d/%Y")
mydf
# store min.date max.date status
# 1 NYC1 2013-01-01 2013-02-01 Open
# 2 NYC1 2013-02-02 2013-02-03 Closed for Inspection
# 3 Boston1 2013-01-01 2013-02-05 Open
计算“min.date”和“max.date”之间的天数差
使用该信息“扩展”您的data.frame
并生成“min.date”和“max.date”之间的日期序列。此外,对 进行子集化data.frame
以仅返回“store”、“date”(我们的新变量)和“status”变量。
SEQ <- mydf$max.date - mydf$min.date + 1
mydf2 <- mydf[rep(row.names(mydf), SEQ), ]
mydf2$date <- mydf2$min.date + sequence(SEQ)-1
mydf2 <- mydf2[c("store", "date", "status")]
这是输出的示例。
head(mydf2)
# store date status
# 1 NYC1 2013-01-01 Open
# 1.1 NYC1 2013-01-02 Open
# 1.2 NYC1 2013-01-03 Open
# 1.3 NYC1 2013-01-04 Open
# 1.4 NYC1 2013-01-05 Open
# 1.5 NYC1 2013-01-06 Open
tail(mydf2)
# store date status
# 3.30 Boston1 2013-01-31 Open
# 3.31 Boston1 2013-02-01 Open
# 3.32 Boston1 2013-02-02 Open
# 3.33 Boston1 2013-02-03 Open
# 3.34 Boston1 2013-02-04 Open
# 3.35 Boston1 2013-02-05 Open
您可以使用它by
来验证我们所做的一切是否正确:
> with(mydf2, by(date, list(store, status), FUN = range))
: Boston1
: Closed for Inspection
NULL
-----------------------------------------------------------------
: NYC1
: Closed for Inspection
[1] "2013-02-02" "2013-02-03"
-----------------------------------------------------------------
: Boston1
: Open
[1] "2013-01-01" "2013-02-05"
-----------------------------------------------------------------
: NYC1
: Open
[1] "2013-01-01" "2013-02-01"