0

这个问题与这篇文章有关: 如何在 R 中的多个时间序列上应用 dtw 算法?

原始帖子的数据框仅包含 1 个感兴趣的变量:speed.kph.ED.

#data: 8 observations, 3 cars 
file.ID2 <- c("Cars_03", "Cars_03", "Cars_03", 
              "Cars_03", "Cars_03", "Cars_03", "Cars_03", "Cars_03", "Cars_04", 
              "Cars_04", "Cars_04", "Cars_04", "Cars_04", "Cars_04", "Cars_04", 
              "Cars_04", "Cars_05", "Cars_05", "Cars_05", "Cars_05", "Cars_05", 
              "Cars_05", "Cars_05", "Cars_05")
speed.kph.ED <- c(129.3802848, 
                  129.4022304, 129.424176, 129.4461216, 129.4680672, 129.47904, 
                  129.5009856, 129.5229312, 127.8770112, 127.8221472, 127.7672832, 
                  127.7124192, 127.6575552, 127.6026912, 127.5478272, 127.4929632, 
                  134.1095616, 134.1205344, 134.1315072, 134.1534528, 134.1644256, 
                  134.1753984, 134.1863712, 134.197344)

df <- data.frame(file.ID2, speed.kph.ED)
df

根据公认答案的建议,以下是使用 dtw 计算 3 辆汽车(3 个时间序列)之间距离的程序:

library(dtw)
library(purrr)
library(dplyr)

# Split your data frame into a list by file.ID2
ds <- split(df, df$file.ID2)
ds

# Use expand.grid to make all combinations of your names, file.ID2 and your values
Names <- expand.grid(unique(df$file.ID2), unique(df$file.ID2))
Values <- expand.grid(ds, ds)

# purrr:map_dbl iterates through all row-combinations of Values and returns a vector of doubles
Dist <- map_dbl(1:nrow(Values), ~dtw(x = Values[.x,]$Var1[[1]]$speed.kph.ED, y = Values[.x,]$Var2[[1]]$speed.kph.ED)$distance)

# Bind answer to Names
library(dplyr)
ans <- Names %>% 
  mutate(distance = Dist)

ans

我想知道如果在计算 3 辆汽车(3 个时间序列)之间的距离时我还想考虑另外两个变量怎么办?

例如,假设我还有另外 2 个变量score.kph.EDrating.kph.ED

score.kph.ED <- c(1:24)
rating.kph.ED <- c(25:48)


df <- data.frame(file.ID2, speed.kph.ED, score.kph.ED, rating.kph.ED)
df

现在,3 辆车之间的距离不仅基于 计算speed.kph.ED,而且基于score.kph.EDrating.kph.ED

如何修改现有代码以实现此目标?

非常感谢你的帮助!

4

2 回答 2

0

你可以这样做:

library(purrr)

df <- data.frame(file.ID2, speed.kph.ED, score.kph.ED, rating.kph.ED)
ds <- split(df, df$file.ID2)
Names <- expand.grid(unique(df$file.ID2), unique(df$file.ID2))
Values <- expand.grid(ds, ds)

cols <- names(df)[-1]
result <- map_dfc(cols, function(col) map_dbl(1:nrow(Values),
  ~dtw(x = Values[.x,]$Var1[[1]][[col]], 
       y = Values[.x,]$Var2[[1]][[col]])$distance))

names(result) <- paste0('dist.', cols)
cbind(Names, result)


#     Var1    Var2 dist.speed.kph.ED dist.score.kph.ED dist.rating.kph.ED
#1 Cars_03 Cars_03           0.00000                 0                  0
#2 Cars_04 Cars_03          25.66538                71                 71
#3 Cars_05 Cars_03          69.72117               191                191
#4 Cars_03 Cars_04          25.66538                71                 71
#5 Cars_04 Cars_04           0.00000                 0                  0
#6 Cars_05 Cars_04          96.00103                71                 71
#7 Cars_03 Cars_05          69.72117               191                191
#8 Cars_04 Cars_05          96.00103                71                 71
#9 Cars_05 Cars_05           0.00000                 0                  0
于 2021-01-10T04:26:34.913 回答
0

您正在尝试做的事情称为多变量 DTW,您可以通过使用proxy包来简化事情。检查this other answer,但您基本上可以这样做(使用示例中的变量):

proxy::dist(lapply(ds, function(x) { x[, -1L] }), method = "dtw")
于 2021-03-25T23:04:39.083 回答