perl - Perl错误地添加换行符？

Question

这是我的制表符分隔的输入文件

Name<tab>Street<tab>Address

这就是我希望我的输出文件的样子

Street<tab>Address<tab>Address

（是的，重复接下来的两列）我的输出文件看起来像这样

Street<tab>Address
         <tab>Address

perl 是怎么回事？这是我的代码。

open (IN, $ARGV[0]);

open (OUT, ">output.txt");
while ($line = <IN>){

    chomp $line;
    @line=split/\t/,$line;

    $line[2]=~s/\n//g;
   print OUT $line[1]."\t".$line[2]."\t".$line[2]."\n";
}

close( OUT);

score 4 · Accepted Answer

First of all, you should always

use strict and use warnings for even the most trivial programs. You will also need to declare each of your variables using my as close as possible to their first use
use lexical file handles and the three-parameter form of open
check the success of every open call, and die with a string that includes $! to show the reason for the failure

Note also that there is no need to explicitly open files named on the command line that appear in @ARGV: you can just read from them using <>.

As others have said, it looks like you are reading a file of DOS or Windows origin on a Linux system. Instead of using chomp, you can remove all trailing whitespace characters from each line using s/\s+\z//. Since CR and LF both count as "whitespace", this will remove all line terminators from each record. Beware, however, that, if trailing space is significant or if the last field may be blank, then this will also remove spaces and tabs. In that case, s/[\r\n]+\z// is more appropriate.

This version of your program works fine.

use strict;
use warnings;

@ARGV = 'addr.txt';

open my $out, '>', 'output.txt' or die $!;

while (<>) {
  s/\s+\z//;
  my @fields = split /\t/;
  print $out join("\t", @fields[1, 2, 2]), "\n";
}

close $out or die $!;

score 2 · Accepted Answer

如果您事先知道数据文件的来源，并且知道它是一个类似 DOS 的文件，以结束记录，则可以在打开文件时CR LF使用该层。PerlIO crlf像这样

open my $in, '<:crlf', $ARGV[0] or die $!;

那么所有的记录都会在"\n"Linux 系统上被读取时结束。

此问题的一般解决方案是安装PerlIO::eol. 然后你可以写

open my $in, '<:raw:eol(LF)', $ARGV[0] or die $!;

并且行结尾将始终与"\n"文件的来源无关，也与 Perl 运行的平台无关。

score 0 · Accepted Answer

避免行尾问题的另一种方法是仅捕获您感兴趣的字符：

open (IN, $ARGV[0]);

open (OUT, ">output.txt");
while (<IN>) {
    print OUT "$1\t$2\t$2\n" if /^(\w+)\t\w+\t(\w+)\s*/;
}

close( OUT);

score 0 · Accepted Answer

您是否尝试不仅消除“\n”，还消除“\r”？

$file[2] =~ s/\r\n//g;
$file[3] =~ s/\r\n//g; # Is it the "good" one?

它可以工作。DOS 行结尾也可以是“\r”（不仅是“\n”）。

perl - Perl错误地添加换行符？

4 回答 4

Related

Reference