我有多个玩家在不同时间点进行的游戏的结果。我从两个不同的来源获得了这些信息,它们为每个玩家分配了不同的唯一 ID。我想找到一种雄辩的方法来通过玩家 id 匹配两个数据源。两个数据源
sourcex <- structure(list(outcomedate = structure(c(12637, 12637, 12637,
12637, 12637, 12637, 12637, 12637, 12637, 12637, 12637, 12638,
12639, 12640, 12640, 12640, 12640, 12640, 12640, 12640, 12640,
12641, 12641, 12641, 12643, 12644, 12644, 12644, 12644, 12644,
12644, 12644, 12644, 12644, 12644, 12645), class = "Date"), xid1 = c(206L,
208L, 209L, 216L, 233L, 235L, 239L, 241L, 250L, 253L, 259L, 238L,
236L, 211L, 221L, 234L, 249L, 254L, 255L, 257L, 258L, 207L, 230L,
248L, 258L, 207L, 211L, 230L, 234L, 236L, 248L, 249L, 254L, 255L,
257L, 221L), xid2 = c(211L, 207L, 221L, 249L, 248L, 257L, 234L,
255L, 236L, 258L, 254L, 230L, 241L, 253L, 235L, 238L, 208L, 233L,
239L, 259L, 206L, 209L, 250L, 216L, 259L, 216L, 241L, 208L, 235L,
239L, 253L, 250L, 209L, 238L, 206L, 233L), outcome1 = c(2L, 1L,
0L, 2L, 1L, 3L, 1L, 1L, 2L, 2L, 0L, 2L, 3L, 3L, 1L, 0L, 2L, 0L,
0L, 0L, 2L, 1L, 2L, 1L, 0L, 3L, 2L, 0L, 0L, 0L, 2L, 2L, 2L, 1L,
1L, 1L), outcome2 = c(0L, 0L, 0L, 1L, 1L, 2L, 1L, 1L, 1L, 2L,
0L, 1L, 0L, 1L, 0L, 0L, 1L, 2L, 0L, 2L, 1L, 2L, 2L, 1L, 1L, 2L,
2L, 0L, 1L, 1L, 2L, 1L, 0L, 1L, 1L, 3L)), .Names = c("outcomedate",
"xid1", "xid2", "outcome1", "outcome2"), row.names = c(NA, 36L
), class = "data.frame")
sourcey <- structure(list(outcomedate = structure(c(12637, 12637, 12637,
12637, 12637, 12637, 12637, 12637, 12637, 12637, 12637, 12638,
12639, 12640, 12640, 12640, 12640, 12640, 12640, 12640, 12640,
12641, 12641, 12641, 12643, 12644, 12644, 12644, 12644, 12644,
12644, 12644, 12644, 12644, 12644, 12645), class = "Date"), yid1 = c(56,
46, 67, 68, 59, 63, 55, 50, 66, 61, 57, 58, 53, 60, 64, 48, 69,
54, 51, 65, 62, 47, 49, 52, 64, 60, 47, 48, 69, 49, 54, 51, 65,
53, 52, 62), yid2 = c(47, 51, 64, 48, 62, 69, 53, 54, 60, 49,
65, 52, 50, 63, 57, 56, 61, 46, 58, 67, 66, 59, 68, 55, 63, 57,
68, 55, 59, 67, 58, 66, 50, 46, 56, 61), outcome1 = structure(c(1L,
1L, 2L, 2L, 3L, 3L, 2L, 1L, 4L, 1L, 2L, 2L, 4L, 3L, 2L, 2L, 3L,
3L, 3L, 4L, 1L, 1L, 1L, 2L, 3L, 1L, 4L, 2L, 2L, 2L, 1L, 3L, 2L,
3L, 3L, 1L), .Label = c("1", "2", "0", "3", "4", "5", "6"), class = "factor"),
outcome2 = structure(c(1L, 2L, 3L, 2L, 1L, 1L, 2L, 2L, 3L,
2L, 1L, 2L, 1L, 3L, 2L, 2L, 3L, 1L, 1L, 2L, 1L, 3L, 2L, 3L,
2L, 2L, 3L, 2L, 1L, 3L, 2L, 2L, 3L, 2L, 1L, 4L), .Label = c("0",
"1", "2", "3", "4"), class = "factor")), .Names = c("outcomedate",
"yid1", "yid2", "outcome1", "outcome2"), row.names = c(NA, 36L
), class = "data.frame")
两个来源都有一个outcomedate
, outcome1
,outcome2
共同点。他们为游戏中的各个玩家分配不同的 ID。我已经完成了以下操作来查找 id 之间的匹配。
sourcex$ID <- with(sourcex, paste0(outcomedate, outcome1, outcome2))
sourcey$ID <- with(sourcey, paste0(outcomedate, outcome1, outcome2))
uPlayersx <- with(sourcex, unique(c(xid1, xid2)))
uPlayersy <- with(sourcey, unique(c(yid1, yid2)))
comparex <- sapply(uPlayersx, function(x){
paste0(with(sourcex, ID[xid1 == x| xid2 == x]), collapse = '~')
})
comparey <- sapply(uPlayersy, function(x){
paste0(with(sourcey, ID[yid1 == x| yid2 == x]), collapse = '~')
})
dumMatch <- data.frame(xid = uPlayersx, yid = uPlayersy[match(comparex, comparey)])
它在这个测试数据集上工作正常,但是真正的应用程序更大,这感觉就像一个 hack。此外,真实数据集在报告等方面可能存在错误,因此可能需要部分匹配。任何帮助,将不胜感激。