2

这种事情以前也有人问过,但我找不到这种方式。

关于创建顺序 ID 的线程,以及几个附加链接

在序列中创建标识符并不难,但我的数据包含一个让我陷入循环的时间元素。以下数据是一个虚构的数据集,只是为了说明一些易于处理的问题:

    dput(walking_dat)
structure(list(neighborhood = structure(c(3L, 3L, 3L, 3L, 3L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Dinkytown", "Downtown", 
"Uptown"), class = "factor"), street = structure(c(4L, 3L, 3L, 
5L, 3L, 4L, 6L, 7L, 4L, 4L, 1L, 2L, 1L), .Label = c("12thAve", 
"14thAve", "Dupont", "Hennepin", "Lyndale", "Marquette", "Nicolette"
), class = "factor"), sequence = c(1, 2, 3, 4, 5, 1, 2, 3, 4, 
5, 1, 2, 3), visit = c(1, 1, 1, 1, 2, 1, 1, 1, 2, 2, 1, 1, 2)), .Names = c("neighborhood", 
"street", "sequence", "visit"), row.names = c(NA, -13L), class = "data.frame")

   neighborhood    street sequence visit
1        Uptown  Hennepin        1     1
2        Uptown    Dupont        2     1
3        Uptown    Dupont        3     1
4        Uptown   Lyndale        4     1
5        Uptown    Dupont        5     2
6      Downtown  Hennepin        1     1
7      Downtown Marquette        2     1
8      Downtown Nicolette        3     1
9      Downtown  Hennepin        4     2
10     Downtown  Hennepin        5     2
11    Dinkytown   12thAve        1     1
12    Dinkytown   14thAve        2     1
13    Dinkytown   12thAve        3     2

为便于想象,所有数据均来自明尼阿波利斯三个街区向东行走的三个人。每行代表记录其位置的时间。第一列是他们正在走过的社区。第二列是它们在每个时间点所在的交叉点。第三列是这些数据发生的顺序。

我想创建一个visit列,将同一条街道、同一社区的连续时间点记录为单次访问,​​然后将回访记录为下一次访问。如何创建这种顺序标识符?


我在想这个ave()技巧FUN=seq_along可能会奏效,但我找不到一种方法来结合让我到达我想去的地方的因素。

为每组数据帧中的行创建一个序号(计数器)[重复]


更新:Uwe 的解决方案有效,但如果有人决定在一个交叉点进行所有测量,这就是我试图将其用于真实数据时发生的情况。如果发生这种情况,则原始行数不会返回到最终的 data.table。看看这里发生了什么:

dput(walking_dat_2)
structure(list(neighborhood = structure(c(3L, 3L, 3L, 3L, 3L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Dinkytown", "Downtown", 
"Uptown"), class = "factor"), street2 = structure(c(2L, 2L, 2L, 
2L, 2L, 2L, 3L, 4L, 2L, 2L, 1L, 1L, 1L), .Label = c("12thAve", 
"Hennepin", "Marquette", "Nicolette"), class = "factor"), sequence = c(1, 
2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3), visit_2 = c(1, 1, 1, 1, 
1, 1, 1, 1, 2, 2, 1, 1, 1)), .Names = c("neighborhood", "street2", 
"sequence", "visit_2"), row.names = c(NA, -13L), class = "data.frame")

   neighborhood   street2 sequence visit_2
1        Uptown  Hennepin        1       1
2        Uptown  Hennepin        2       1
3        Uptown  Hennepin        3       1
4        Uptown  Hennepin        4       1
5        Uptown  Hennepin        5       1
6      Downtown  Hennepin        1       1
7      Downtown Marquette        2       1
8      Downtown Nicolette        3       1
9      Downtown  Hennepin        4       2
10     Downtown  Hennepin        5       2
11    Dinkytown   12thAve        1       1
12    Dinkytown   12thAve        2       1
13    Dinkytown   12thAve        3       1

在这种情况下,运行 Uwe 的解决方案仅返回 6 行。

library(data.table)
setDT(walking_dat)[, visit_2 := rleid(neighborhood, street2)][
     , unique(.SD, by = "visit_2")][
         , visit_2 := rowid(neighborhood, street2)][
             walking_dat, on = .(neighborhood, street2, sequence), roll = TRUE, visit_2 := x.visit_2][]

   neighborhood   street2 sequence visit visit_2
1:       Uptown  Hennepin        1     1       1
2:     Downtown  Hennepin        1     2       1
3:     Downtown Marquette        2     3       1
4:     Downtown Nicolette        3     4       1
5:     Downtown  Hennepin        4     5       2
6:    Dinkytown   12thAve        1     6       1
4

2 回答 2

1
# Not required, but convenient:
walking_dat$combo <- paste(walking_dat$neighborhood, walking_dat$street)

# Place holder:
walking_dat$visit <- NA

# Create it:
for(i in 1:nrow(walking_dat)){
  if(i %in% row.names(walking_dat[with(walking_dat, c(TRUE, diff(as.numeric(interaction(neighborhood, street))) != 0)), ])){
    walking_dat$visit[i] <- sum(walking_dat$combo[with(walking_dat, c(TRUE, diff(as.numeric(interaction(neighborhood, street))) != 0))][1:i]==walking_dat$combo[i], na.rm=T)
  } else{
    walking_dat$visit[i] <- 1
  }
}

walking_dat
   neighborhood    street sequence visit              combo
1        Uptown  Hennepin        1     1    Uptown Hennepin
2        Uptown    Dupont        2     1      Uptown Dupont
3        Uptown    Dupont        3     1      Uptown Dupont
4        Uptown   Lyndale        4     1     Uptown Lyndale
5        Uptown    Dupont        5     2      Uptown Dupont
6      Downtown  Hennepin        1     1  Downtown Hennepin
7      Downtown Marquette        2     1 Downtown Marquette
8      Downtown Nicolette        3     1 Downtown Nicolette
9      Downtown  Hennepin        4     2  Downtown Hennepin
10     Downtown  Hennepin        5     1  Downtown Hennepin
11    Dinkytown   12thAve        1     2  Dinkytown 12thAve
12    Dinkytown   14thAve        2     1  Dinkytown 14thAve
13    Dinkytown   12thAve        3     2  Dinkytown 12thAve
于 2017-09-30T22:30:49.103 回答
1

这里的困难在于,对同一街区同一条街道的后续录音应计为一次访问。这需要将这些行合并为一,计算对不同社区和街道的访问次数,最后将其扩展为原始行数。

请注意,visit包含预期结果的列不会被覆盖,而是保留用于与计算visit_new列进行比较。

library(data.table)
setDT(walking_dat)[, visit_new := rleid(neighborhood, street)][
  , unique(.SD, by = "visit_new")][
    , visit_new := rowid(neighborhood, street)][
      walking_dat, on = .(neighborhood, street, sequence), roll = TRUE, .SD]
    neighborhood    street sequence visit visit_new
 1:       Uptown  Hennepin        1     1         1
 2:       Uptown    Dupont        2     1         1
 3:       Uptown    Dupont        3     1         1
 4:       Uptown   Lyndale        4     1         1
 5:       Uptown    Dupont        5     2         2
 6:     Downtown  Hennepin        1     1         1
 7:     Downtown Marquette        2     1         1
 8:     Downtown Nicolette        3     1         1
 9:     Downtown  Hennepin        4     2         2
10:     Downtown  Hennepin        5     2         2
11:    Dinkytown   12thAve        1     1         1
12:    Dinkytown   14thAve        2     1         1
13:    Dinkytown   12thAve        3     2         2

一步一步的解释

DF被强制转换为 data.table。该rleid()功能为社区和街道的变化创建唯一的数字。

 setDT(walking_dat)[, visit_new := rleid(neighborhood, street)][]
    neighborhood    street sequence visit visit_new
 1:       Uptown  Hennepin        1     1         1
 2:       Uptown    Dupont        2     1         2
 3:       Uptown    Dupont        3     1         2
 4:       Uptown   Lyndale        4     1         3
 5:       Uptown    Dupont        5     2         4
 6:     Downtown  Hennepin        1     1         5
 7:     Downtown Marquette        2     1         6
 8:     Downtown Nicolette        3     1         7
 9:     Downtown  Hennepin        4     2         8
10:     Downtown  Hennepin        5     2         8
11:    Dinkytown   12thAve        1     1         9
12:    Dinkytown   14thAve        2     1        10
13:    Dinkytown   12thAve        3     2        11

请注意,第 2 行和第 3 行以及第 9 行和第 10 行是重复的。在下一步中删除重复项,这将创建一个新的临时 data.table 对象:

setDT(walking_dat)[, visit_new := rleid(neighborhood, street)][
  , unique(.SD, by = "visit_new")][]
    neighborhood    street sequence visit visit_new
 1:       Uptown  Hennepin        1     1         1
 2:       Uptown    Dupont        2     1         2
 3:       Uptown   Lyndale        4     1         3
 4:       Uptown    Dupont        5     2         4
 5:     Downtown  Hennepin        1     1         5
 6:     Downtown Marquette        2     1         6
 7:     Downtown Nicolette        3     1         7
 8:     Downtown  Hennepin        4     2         8
 9:    Dinkytown   12thAve        1     1         9
10:    Dinkytown   14thAve        2     1        10
11:    Dinkytown   12thAve        3     2        11

rowid()现在,我们可以使用以下函数对不同社区和街道的访问次数进行编号:

setDT(walking_dat)[, visit_new := rleid(neighborhood, street)][
  , unique(.SD, by = "visit_new")][
    , visit_new := rowid(neighborhood, street)][]
    neighborhood    street sequence visit visit_new
 1:       Uptown  Hennepin        1     1         1
 2:       Uptown    Dupont        2     1         1
 3:       Uptown   Lyndale        4     1         1
 4:       Uptown    Dupont        5     2         2
 5:     Downtown  Hennepin        1     1         1
 6:     Downtown Marquette        2     1         1
 7:     Downtown Nicolette        3     1         1
 8:     Downtown  Hennepin        4     2         2
 9:    Dinkytown   12thAve        1     1         1
10:    Dinkytown   14thAve        2     1         1
11:    Dinkytown   12thAve        3     2         2

最后,我们需要将结果再次扩展为原来的行数。这是通过将临时 data.table 与原始数据表(包括所有行)滚动连接来完成的:DF

setDT(walking_dat)[, visit_new := rleid(neighborhood, street)][
  , unique(.SD, by = "visit_new")][
    , visit_new := rowid(neighborhood, street)][
      walking_dat, on = .(neighborhood, street, sequence), roll = TRUE, .SD]

也许,值得一提的是,它visit_new被使用和重复用于在各个阶段保存临时数据,直到最终更新。

新数据集

固定代码也适用于 OP 提供的第二个数据集:

walking_dat_2 <-
structure(list(neighborhood = structure(c(3L, 3L, 3L, 3L, 3L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L), .Label = c("Dinkytown", "Downtown", 
"Uptown"), class = "factor"), street = structure(c(2L, 2L, 2L, 
2L, 2L, 2L, 3L, 4L, 2L, 2L, 1L, 1L, 1L), .Label = c("12thAve", 
"Hennepin", "Marquette", "Nicolette"), class = "factor"), sequence = c(1, 
2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3), visit = c(1, 1, 1, 1, 1, 
1, 1, 1, 2, 2, 1, 1, 1), visit_new = c(1L, 1L, 1L, 1L, 1L, 2L, 
3L, 4L, 5L, 5L, 6L, 6L, 6L)), .Names = c("neighborhood", "street", 
"sequence", "visit", "visit_new"), row.names = c(NA, -13L), class = "data.frame")

setDT(walking_dat_2)[, visit_new := rleid(neighborhood, street)][
  , unique(.SD, by = "visit_new")][
    , visit_new := rowid(neighborhood, street)][
      walking_dat_2, on = .(neighborhood, street, sequence), 
      roll = TRUE, .SD]
    neighborhood    street sequence visit visit_new
 1:       Uptown  Hennepin        1     1         1
 2:       Uptown  Hennepin        2     1         1
 3:       Uptown  Hennepin        3     1         1
 4:       Uptown  Hennepin        4     1         1
 5:       Uptown  Hennepin        5     1         1
 6:     Downtown  Hennepin        1     1         1
 7:     Downtown Marquette        2     1         1
 8:     Downtown Nicolette        3     1         1
 9:     Downtown  Hennepin        4     2         2
10:     Downtown  Hennepin        5     2         2
11:    Dinkytown   12thAve        1     1         1
12:    Dinkytown   12thAve        2     1         1
13:    Dinkytown   12thAve        3     1         1
于 2017-09-30T22:30:52.100 回答