0

我从脚本中打印了一些时间戳,这件作品花费了太长时间:差不多 5 分钟才能完成......!!!

仅供参考,strArr 数组包含大约 1500 个字符串元素。(这个循环运行了很多次)

文件 tmp_FH_SR 为 27Mb 和 300,000 行数据。文件 tmp_FH_RL 为 13 Mb,包含大约 150,000 行数据。

我已更改变量的名称以保护实际名称...

在第一个 while 循环中,基于 $str 在文件中仅找到一次的事实,我从匹配记录中获取另一个字段。我使用此字段来搜索该字段在另一个文件中出现的次数。基于该输出,我将 $str 添加到数组中。

my $tmp_srt;
foreach my $str (@strArr)
{
    my $tmp1;
    my $count=0;
    seek $tmp_FH_SR,0,0;
    while (<$tmp_FH_SR>)
    {
        my $line=$_;chomp($line);
        if ($line=~ m/\"$str\"/)
        {
            $count++;
            if ($count == 1)
            {
                my @tmp_line_ar = split(/\,/,$line);
                $tmp_str=$tmp_line_ar[10];
            }
        }
    }
    if ($count == 1)
    {
        my $k;
        seek $tmp_FH_RL,0,0;
        while (<$tmp_FH_RL>)
        {
            my $line=$_;chomp($line);
            if ($line=~m/\"$tmp_str\"/) {$k++;}
        }
        if($k == 1){push(@another_str_arr,$str);}
    }
}

how can i make it faster? read the 27mb and 13mb files in an array one time and work? I wanted to avoid that, as many other process be running on the host where this runs.

ty.

4

1 回答 1

7

You're going at it backwards, which is one reason why it's taking so long.

@strAtt is only 1500 entries, and you're reading each file 1500 times because of your loop.

Put the entires in @strArr in a map or use a multi-dimentional array so you can keep track of your count for each entry. Read a line from the file, then loop over the 1500 entries. You now read in the file only once.

于 2012-08-20T23:03:45.800 回答