这种事情以前也有人问过,但我找不到这种方式。
在序列中创建标识符并不难,但我的数据包含一个让我陷入循环的时间元素。以下数据是一个虚构的数据集,只是为了说明一些易于处理的问题:
dput(walking_dat)
structure(list(neighborhood = structure(c(3L, 3L, 3L, 3L, 3L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Dinkytown", "Downtown",
"Uptown"), class = "factor"), street = structure(c(4L, 3L, 3L,
5L, 3L, 4L, 6L, 7L, 4L, 4L, 1L, 2L, 1L), .Label = c("12thAve",
"14thAve", "Dupont", "Hennepin", "Lyndale", "Marquette", "Nicolette"
), class = "factor"), sequence = c(1, 2, 3, 4, 5, 1, 2, 3, 4,
5, 1, 2, 3), visit = c(1, 1, 1, 1, 2, 1, 1, 1, 2, 2, 1, 1, 2)), .Names = c("neighborhood",
"street", "sequence", "visit"), row.names = c(NA, -13L), class = "data.frame")
neighborhood street sequence visit
1 Uptown Hennepin 1 1
2 Uptown Dupont 2 1
3 Uptown Dupont 3 1
4 Uptown Lyndale 4 1
5 Uptown Dupont 5 2
6 Downtown Hennepin 1 1
7 Downtown Marquette 2 1
8 Downtown Nicolette 3 1
9 Downtown Hennepin 4 2
10 Downtown Hennepin 5 2
11 Dinkytown 12thAve 1 1
12 Dinkytown 14thAve 2 1
13 Dinkytown 12thAve 3 2
为便于想象,所有数据均来自明尼阿波利斯三个街区向东行走的三个人。每行代表记录其位置的时间。第一列是他们正在走过的社区。第二列是它们在每个时间点所在的交叉点。第三列是这些数据发生的顺序。
我想创建一个visit
列,将同一条街道、同一社区的连续时间点记录为单次访问,然后将回访记录为下一次访问。如何创建这种顺序标识符?
我在想这个ave()
技巧FUN=seq_along
可能会奏效,但我找不到一种方法来结合让我到达我想去的地方的因素。
更新:Uwe 的解决方案有效,但如果有人决定在一个交叉点进行所有测量,这就是我试图将其用于真实数据时发生的情况。如果发生这种情况,则原始行数不会返回到最终的 data.table。看看这里发生了什么:
dput(walking_dat_2)
structure(list(neighborhood = structure(c(3L, 3L, 3L, 3L, 3L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Dinkytown", "Downtown",
"Uptown"), class = "factor"), street2 = structure(c(2L, 2L, 2L,
2L, 2L, 2L, 3L, 4L, 2L, 2L, 1L, 1L, 1L), .Label = c("12thAve",
"Hennepin", "Marquette", "Nicolette"), class = "factor"), sequence = c(1,
2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3), visit_2 = c(1, 1, 1, 1,
1, 1, 1, 1, 2, 2, 1, 1, 1)), .Names = c("neighborhood", "street2",
"sequence", "visit_2"), row.names = c(NA, -13L), class = "data.frame")
neighborhood street2 sequence visit_2
1 Uptown Hennepin 1 1
2 Uptown Hennepin 2 1
3 Uptown Hennepin 3 1
4 Uptown Hennepin 4 1
5 Uptown Hennepin 5 1
6 Downtown Hennepin 1 1
7 Downtown Marquette 2 1
8 Downtown Nicolette 3 1
9 Downtown Hennepin 4 2
10 Downtown Hennepin 5 2
11 Dinkytown 12thAve 1 1
12 Dinkytown 12thAve 2 1
13 Dinkytown 12thAve 3 1
在这种情况下,运行 Uwe 的解决方案仅返回 6 行。
library(data.table)
setDT(walking_dat)[, visit_2 := rleid(neighborhood, street2)][
, unique(.SD, by = "visit_2")][
, visit_2 := rowid(neighborhood, street2)][
walking_dat, on = .(neighborhood, street2, sequence), roll = TRUE, visit_2 := x.visit_2][]
neighborhood street2 sequence visit visit_2
1: Uptown Hennepin 1 1 1
2: Downtown Hennepin 1 2 1
3: Downtown Marquette 2 3 1
4: Downtown Nicolette 3 4 1
5: Downtown Hennepin 4 5 2
6: Dinkytown 12thAve 1 6 1