这是我在 R 中运行的代码:
options(stringsAsFactors=FALSE)
x=read.table("sample.txt")
y=read.table("comp.txt")
nrowx=nrow(x)
nrowy=nrow(y)
for(i in 1:nrowx)
{
flag=0
for(j in 1:nrowy)
{
if(x[i,2]==y[j,2])
{
x[i,2]=y[j,1]
flag=1
break
}
}
if(flag==0)
x[i,]=NA
}
这里 x 有 2,000,000 个条目,而 y 有大约 2,500 个条目。执行 25 个 x 条目大约需要 1 分钟(根据代码)。
在 x 中读取的文件的几行:
"X1" "X2"
"1" 53 "all.downtown@enron.com"
"2" 54 "all.enron-worldwide@enron.com"
"3" 55 "all.worldwide@enron.com"
"4" 56 "all_enron_north.america@enron.com"
"5" 56 "ec.communications@enron.com"
"6" 57 "charlotte@wptf.org"
"7" 58 "sap.mailout@enron.com"
"8" 59 "robert.badeer@enron.com"
"9" 60 "tim.belden@enron.com"
"10" 60 "robert.badeer@enron.com"
"11" 60 "jeff.richter@enron.com"
"12" 60 "valarie.sabo@enron.com"
"13" 60 "carla.hoffman@enron.com"
"14" 60 "murray.o neil@enron.com"
"15" 60 "chris.stokley@enron.com"
在 y 中读取的文件的几行:
"X1" "X2"
"1" 1 "jeff.dasovich@enron.com"
"2" 2 "kay.mann@enron.com"
"3" 3 "sara.shackleton@enron.com"
"4" 4 "tana.jones@enron.com"
"5" 5 "vince.kaminski@enron.com"
"6" 6 "pete.davis@enron.com"
"7" 7 "chris.germany@enron.com"
"8" 8 "matthew.lenhart@enron.com"
"9" 9 "debra.perlingiere@enron.com"
"10" 10 "mark.taylor@enron.com"
"11" 11 "gerald.nemec@enron.com"
"12" 12 "richard.sanders@enron.com"
"13" 13 "james.steffes@enron.com"
"14" 14 "steven.kean@enron.com"
"15" 15 "susan.scott@enron.com"
请提出一些替代方法来加快执行速度。谢谢!:)