扩展 agstudy 提供的解决方案(请参阅上面的评论),我生成了以下解决方案,该解决方案生成了一个数据框,数据框中的每个相似行彼此相邻。
df<-data.frame(name = c("Andrew", "Andrem", "Adam", "Pamdrew", "Adan"), id = c(12334, 12344, 34345, 98974, 34344), score = c(90, 90, 83, 95, 83))
xx <- do.call(paste0,df) ## concatenate all columns
df3<-df[0,] ## empty data frame for storing loop results
for (i in 1:nrow(df)){ ## produce results for each row of the data frame
df2<-df[agrep(xx[i],xx,max=0.3*nchar(xx[i])),] ##set level of similarity required (less than 30% dissimilarity in this case)
if(nrow(df2) >= 2){df3<-rbind(df3, df2)} ## rows without matches returned themselves...this eliminates them
df3<-df3[!duplicated(df3), ] ## store saved values in df3
}
我确信有更清洁的方法可以产生这些结果,但这可以完成工作。