由于我对先前问题的糟糕执行和解释,我将重新开始,并尝试将问题表述得尽可能简短和笼统。
我有两个数据框(请参见下面的示例)。每个数据集包含相同数量的列。
tc <- textConnection('
ID Track1 Track2 Track3 Track4 Time Loc
4 15 "" "" 50 40 1
5 17 115 109 55 50 1
6 17 115 109 55 60 1
7 13 195 150 60 70 1
8 13 195 150 60 80 1
9 "" "" 181 70 90 2 #From this row, example data added
10 "" "" 182 70 92 2
11 429 31 "" 80 95 3
12 480 31 12 80 96 3
13 118 "" "" 90 100 4
14 120 16 213 90 101 4
')
MATCHINGS <- read.table(tc, header=TRUE)
tc <- textConnection('
ID Track1 Track2 Track3 Track4 Time Loc
"" 15 "" "" 50 40 1
"" 17 "" 109 55 50 1
"" 17 432 109 55 65 1
"" 17 115 109 55 59 1
"" 13 195 150 60 68 1
"" 13 195 150 60 62 1
"" 10 5 1 10 61 3
"" 13 195 150 60 72 1
"" 40 "" 181 70 82 2 #From this row, example data added
"" "" "" 182 70 85 2
"" 429 "" "" 80 90 3
"" "" 31 12 80 92 3
"" "" "" "" 90 95 4
"" 118 16 213 90 96 4
')
INVOLVED <- read.table(tc, header=TRUE)
目标是通过匹配to和来放置最近的 ID from MATCHINGS
into 。一个额外的条件是匹配条目的 可能不高于中的条目。此外,match on是最优选的,match on是最不优选的。但是 only始终可用(所有其他-columns 可以为空)。因此,预期的结果是:INVOLVED
Track1
Track4
Loc
Time
INVOLVED
Time
MATCHING
Track1
Track4
Track4
Track
ID Track1 Track2 Track3 Track4 Time Loc
4 15 "" "" 50 40 1
5 17 "" 109 55 50 1
"" 17 432 109 55 65 1
6 17 115 109 55 59 1
7 13 195 150 60 68 1
7 13 195 150 60 62 1
"" 10 5 1 10 61 3
8 13 195 150 60 72 1
9 40 "" 181 70 82 2 #From this row, example data added
10 "" "" 182 70 85 2
11 429 "" "" 80 90 3
12 "" 31 12 80 92 3
13 "" "" "" 90 95 4
13 118 16 213 90 96 4
我试图用这个data.table
包来做这个,但没有做到这一点。是否有可能摆脱矢量扫描并在不循环的情况下有效地遍历数据?
dat <- data.table(MATCHINGS)
for(i in 1:nrow(INVOLVED)){
row <- INVOLVED[i,]
match <- dat[Time>=row$Time][Loc==row$Loc][Track4==row$Track4][Track4!=""][order(Time)][1]
if(!is.na(match$ID)){ INVOLVED$ID[i]<-match$ID }
match <- dat[Time>=row$Time][Loc==row$Loc][Track3==row$Track3][Track3!=""][order(Time)][1]
if(!is.na(match$ID)){ INVOLVED$ID[i]<-match$ID }
match <- dat[Time>=row$Time][Loc==row$Loc][Track2==row$Track2][Track2!=""][order(Time)][1]
if(!is.na(match$ID)){ INVOLVED$ID[i]<-match$ID }
match <- dat[Time>=row$Time][Loc==row$Loc][Track1==row$Track1][Track1!=""][order(Time)][1]
if(!is.na(match$ID)){ INVOLVED$ID[i]<-match$ID }
}
更新
更新了显示需要的示例数据Track 1 to 3
。如图所示Track1
是最重要和Track4
最不重要的。即使Track1 to 3
match toMATCHINGS x
和Track4
match to MATCHINGS y
,ID
ofy
也应该分配给 that INVOLVED row
。所以:Track3
匹配覆盖Track4
匹配,Track2
匹配覆盖Track3
匹配,Track1
匹配覆盖Track2
匹配。