0

我有一个像

author_id paper_id confirmed     author_name1   author_affiliation1         author_name   
   826    25733         1     Emanuele Buratti  Genetic engineering    Emanuele Buratti
   826    25733         1     Emanuele Buratti  International center   Emanuele Buratti
   826    47276         1     Emanuele Buratti                         Emanuele Buratti
   826    77012         1     Emanuele Buratti                         Emanuele Buratti
   826    77012         1     Emanuele Buratti                         Emanuele Buratti
   826    79468         1     Emanuele Buratti                         Emanuele Buratti

author_affiliation
Genetic enginereing                                                                                                
The International Centre for Genetic Engineering and Biotechnology, Padriciano 66,        
Trieste, Italy


International Centre for Genetic Engineering and Biotechnology, Padriciano 99, 34149                         
Trieste, Italy

现在我必须检查 author_name 和 author_name1(name_dist) 之间的每一行 strindist 以及 author_affiliation 与 author_affiliation1(aff_sit.

我在用

name_dist<-vector()
aff_dist<-vector()
for(i in 1:nrow(mer1))
{
 name_dist[i]<-stringdist(mer1$author_name1[i],mer1$author_name[i],method="lv")
 aff_dist[i]<-stringdist(mer1$author_affiliation1[i],mer1$author_affiliation[i],method="lv")

 }

但这需要花费大量时间。如何有效地完成这项工作?

谢谢

4

3 回答 3

1

您可以直接对其进行矢量化

i=1:nrow(mer1)
name_dist<-stringdist(mer1$author_name1[i],mer1$author_name[i],method="lv")
aff_dist<-stringdist(mer1$author_affiliation1[i],mer1$author_affiliation[i],method="lv")
于 2014-03-24T12:30:59.583 回答
1

您可以使用sapply(或其他一些矢量化方法),如下所示:

a = letters[1:5] # your mer1$author_name1
b = LETTERS[1:5] # your mer1$author_name
name_dist = sapply(a, stringdist, b, method="lv")
于 2014-03-24T12:36:48.570 回答
0

尝试

res <- transform(mer1, 
    name_dist=stringdist(author_name1,author_name,method="lv"),
    aff_dist=stringdist(author_affiliation1,author_affiliation,method="lv")
)

由于stringdist是一个能够进行矢量输入的函数,因此这种方式应该更有效。

于 2014-03-24T12:22:48.223 回答