2

我有一个模式向量和一个大的潜在匹配候选向量。对于xagrep用来获取y. 问题是代码非常慢 - 每个元素大约需要 2 秒x

有没有办法加快速度?x在这个例子中只有 6 个元素,但在实际项目x中长度为 41k。y这里大约有 103k 个元素,接近现实生活。

如果您需要查看示例输出,请替换3300y1

提前致谢!

x = c("procter & gamble; tide free & gentle", "procter & gamble; tide he turbo clean", 
      "procter & gamble; tide simply clean", "procter & gamble; tide simply clean & fresh", 
      "procter & gamble; tide simply clean & sensitive", "procter & gamble; tide total care")

y = rep(c("procter & gamble; tide", "procter & gamble; tide & downy", 
          "procter & gamble; tide actilift", "procter & gamble; tide basic", 
          "procter & gamble; tide boost", "procter & gamble; tide boost vivid white + bright", 
          "procter & gamble; tide buzz", "procter & gamble; tide cold water", 
          "procter & gamble; tide colorguard", "procter & gamble; tide compact", 
          "procter & gamble; tide febreze", "procter & gamble; tide febreze sport", 
          "procter & gamble; tide high efficiency", "procter & gamble; tide oxi", 
          "procter & gamble; tide plus", "procter & gamble; tide plus colorguard", 
          "procter & gamble; tide pods", "procter & gamble; tide pods plus febreze", 
          "procter & gamble; tide pure essentials", "procter & gamble; tide simple pleasures", 
          "procter & gamble; tide simply clean & fresh", "procter & gamble; tide simply clean & sensitive", 
          "procter & gamble; tide stain release", "procter & gamble; tide stain release free", 
          "procter & gamble; tide to go", "procter & gamble; tide total clean", 
          "procter & gamble; tide totalcare", "procter & gamble; tide ultra 2", 
          "procter & gamble; tide vivid white & bright", "procter & gamble; tide with dawn", 
          "procter & gamble; tidekick"),3300)

mapped = as.matrix("",nrow=length(x))

myMap = function() {
  for (i in 1:length(x)) {
    mapped[i] = paste(y[agrep(x[i],y,max.distance=2.9,fixed=T,useBytes=T)],collapse = "|")  
  }
  return(mapped)
}

print(microbenchmark(myMap(),times=5))

定时

Unit: seconds
    expr      min       lq    mean   median       uq    max neval
 myMap() 11.57354 11.61535 11.6225 11.61919 11.64641 11.658     5

中仅重复 1 次的示例输出y

1    
2    
3   procter & gamble; tide simple pleasures|procter & gamble; tide simply clean & fresh|procter & gamble; tide simply clean & sensitive
4   procter & gamble; tide simply clean & fresh
5   procter & gamble; tide simply clean & sensitive
6   procter & gamble; tide total clean|procter & gamble; tide totalcare
4

0 回答 0