0

I'm trying to take a file INPUT and, if a line in that file contains a string, replace the line with something else (the entire line, including line breaks), or nothing at all (remove the line like it wasn't there). Writing all this to a new file .

Here's that section of code...

while(<INPUT>){
    if ($_ =~ /  <openTag>/){
        chomp;
        print OUTPUT "Some_Replacement_String";
    } elsif ($_ =~ /  <\/closeTag>/) {
        chomp;
        print OUTPUT ""; #remove the line
    } else {
        chomp;
        print OUTPUT "$_\r\n"; #print the original line
    }
}

while(<INPUT>) should read one line at a time (if my understanding is correct) and store each line in the special variable $_

However, when I run the above code I get only the very first if statement condition returned Some_Replacement_String, and only once. (1 line, out of a file with 1.3m, and expecting 600,000 replacements). This obviously isn't the behavior I expect. If I do something like while(<INPUT>){print OUTPUT $_;) I get a copy of the entire file, every line, so I know the entire file is being read (expected behavior).

What I'm trying to do is get a line, test it, do something with it, and move on to the next one.

If it helps with troubleshooting at all, if I use print $.; anywhere in that while statement (or after it), I get 1 returned. I expected this to be the "Current line number for the last filehandle accessed.". So by the time my while statement loops through the entire file, it should be equal to the number of lines in the file, not 1.

I've tried a few other variations of this code, but I think this is the closest I've come. I assume there's a good reason I'm not getting the behavior I expect, can anyone tell me what it is?

4

1 回答 1

4

您描述的问题表明您的输入文件仅包含一行。这可能是因为很多不同的事情,例如:

  • 您已更改输入记录分隔符$/
  • 您的输入文件不包含正确的行尾
  • 您正在使用-0777switch运行脚本

关于您的代码的一些注释:

if ($_ =~ /  <openTag>/){
    chomp;
    print OUTPUT "Some_Replacement_String";

无需扼杀您不使用的线路。

} elsif ($_ =~ /  <\/closeTag>/) {
    chomp;
    print OUTPUT "";

这是相当多余的。您不需要打印一个空字符串(永远,真的),并 chomp 一个您不使用的值。

} else {
    chomp;
    print OUTPUT "$_\r\n"; #print the original line

无需删除换行符,然后将它们放回原处。此外,通常你会使用\n你的行尾,即使在 Windows 上也是如此。

而且,由于您在每个 if-else 子句中都大吃一惊,您不妨将其移到整个 if-block 之外。

chomp;
if (....) {

但既然你从不依赖不存在的行尾,为什么还要费心使用chomp呢?

使用$_变量时,您可以缩写一些命令,例如您正在使用chomp. 例如,一个单独的正则表达式将应用于$_

} elsif (/  <\/closeTag>/) {  # works splendidly

当像上面一样,你有一个包含斜杠的正则表达式时,你可以为你的正则表达式选择另一个分隔符,这样你就不需要转义斜杠:

} elsif (m#  </closeTag>#) {

但是你需要使用m//运算符的完整符号,m前面有。

所以,简而言之

while(<INPUT>){
    if (/  <openTag>/){
        print OUTPUT "Some_Replacement_String";
    } elsif (m#  </closeTag>#) {
        # do nothing
    } else {
        print OUTPUT $_;   # print the original line
    }
}

当然,最后两个可以合并为一个,带有一些否定逻辑:

} elsif (not m#  </closeTag>#) {
    print OUTPUT $_;
}
于 2013-10-18T19:01:05.263 回答