我有一个纵向数据集,每月记录一个人的就业状况,持续 45 个月。我希望能够创建两个变量以添加到此数据集中:1)每个人“失业”的总持续时间 2)失业法术的数量
理想情况下,它也会跳过 NA 而不会中断咒语
我创建了一个示例数据集以使事情变得简单:
ID <- c(1:10, 1:10, 1:10)
date <- c("2006-09-01", "2006-09-01", "2006-09-01", "2006-09-01", "2006-09-01", "2006-09-01", "2006-09-01",
"2006-09-01", "2006-09-01", "2006-09-01", "2006-10-01", "2006-10-01", "2006-10-01", "2006-10-01",
"2006-10-01", "2006-10-01", "2006-10-01", "2006-10-01", "2006-10-01", "2006-10-01", "2006-11-01",
"2006-11-01", "2006-11-01", "2006-11-01", "2006-11-01", "2006-11-01", "2006-11-01", "2006-11-01",
"2006-11-01", "2006-11-01")
act <- c("Unemployed", "Employment", "Education", "Education", "Education", "Education", "Education",
"Education", "Education", "Unemployed", "Education", "Unemployed", "Unemployed", "Unemployed",
"Education", "Education", "Employment", "Education", "Education", "NA", "Unemployed",
"Unemployed", "NA", "Unemployed", "Education", "Employment", "Employment", "NA", "Education",
"Unemployed")
df <- data.frame(ID, date, act)
df[order(ID),]
ID date act
1 1 2006-09-01 Unemployed
11 1 2006-10-01 Education
21 1 2006-11-01 Unemployed
2 2 2006-09-01 Employment
12 2 2006-10-01 Unemployed
22 2 2006-11-01 Unemployed
3 3 2006-09-01 Education
13 3 2006-10-01 Unemployed
23 3 2006-11-01 NA
4 4 2006-09-01 Education
14 4 2006-10-01 Unemployed
24 4 2006-11-01 Unemployed
5 5 2006-09-01 Education
15 5 2006-10-01 Education
25 5 2006-11-01 Education
6 6 2006-09-01 Education
16 6 2006-10-01 Education
26 6 2006-11-01 Employment
7 7 2006-09-01 Education
17 7 2006-10-01 Employment
27 7 2006-11-01 Employment
8 8 2006-09-01 Education
18 8 2006-10-01 Education
28 8 2006-11-01 NA
9 9 2006-09-01 Education
19 9 2006-10-01 Education
29 9 2006-11-01 Education
10 10 2006-09-01 Unemployed
20 10 2006-10-01 NA
30 10 2006-11-01 Unemployed
我尝试了 Roland 在计算 R 中的持续时间时提出的解决方案,但我不确定如何调整它以通过 ID 为我提供结果并处理 NA。
library(data.table)
setDT(df)
df[, date := as.POSIXct(date, format = "%Y-%m-%d", tz = "GMT")]
glimpse(df)
df$act <- ifelse(df$act == "Unemployed",1,-1)
df[, run := cumsum(c(1, diff(act) != 0))]
df1 <- df[, list(act = unique(act),
duration = difftime(max(date), min(date), unit = "weeks")),
by = run]
df1
run act duration
1: 1 1 0 weeks
2: 2 -1 0 weeks
3: 3 1 0 weeks
4: 4 -1 0 weeks
5: 5 1 0 weeks
6: 6 -1 0 weeks
7: 7 1 0 weeks
8: 8 -1 0 weeks
9: 9 1 0 weeks
10: 10 -1 0 weeks
11: 11 1 0 weeks
我所追求的是实现这一目标(这里的持续时间是几个月,但可以是几周或几天):
ID spell_count duration
1 1 2 2
2 2 1 2
3 3 1 1
...
10 10 1 2
任何帮助,任何链接/文献/示例将不胜感激。
谢谢你。