perl - 使用 Perl，我如何从具有两个可能的记录分隔符的文件中读取记录？

Question

这是我正在尝试做的事情：

我想将文本文件读入字符串数组。当文件读取某个字符（主要是;or |）时，我希望字符串终止。

例如下面的文字

你会; 请
递给我| 我的外套？

会像这样被收起来：

$string[0] = 'Would you;';
$string[1] = ' please hand me|';
$string[2] = ' my coat?';

我能在这样的事情上得到一些帮助吗？

score 6 · Accepted Answer

这会做到的。在保留要拆分的标记的同时使用拆分的技巧是使用零宽度回溯匹配：split(/(?<=[;|])/, ...).

注意：mctylr 的答案（目前评价最高）实际上并不正确——它会在换行符上拆分字段，b/c 它一次只能在文件的一行上工作。

gbacon 使用输入记录分隔符 ( $/) 的答案非常聪明——它既节省空间又节省时间——但我认为我不想在生产代码中看到它。将一个拆分标记放在记录分隔符中，另一个放在拆分中让我觉得有点太不明显了（你必须用 Perl 来解决这个问题......）这将使其难以维护。我也不确定他为什么要删除多个换行符（我认为您没有要求？）以及为什么他只在以“|”结尾的记录结束时才这样做。

# open file for reading, die with error message if it fails
open(my $fh, '<', 'data.txt') || die $!; 

# set file reading to slurp (whole file) mode (note that this affects all 
# file reads in this block)
local $/ = undef; 

my $string = <$fh>; 

# convert all newlines into spaces, not specified but as per example output
$string =~ s/\n/ /g; 

# split string on ; or |, using a zero-width lookback match (?<=) to preserve char
my (@strings) = split(/(?<=[;|])/, $string);

score 3 · Accepted Answer

一种方法是注入另一个字符，例如\n，每当找到您的特殊字符时，然后在上拆分\n：

use warnings;
use strict;
use Data::Dumper;

while (<DATA>) {
    chomp;
    s/([;|])/$1\n/g;
    my @string = split /\n/;
    print Dumper(\@string);
}

__DATA__
Would you; please hand me| my coat?

打印出来：

$VAR1 = [
          'Would you;',
          ' please hand me|',
          ' my coat?'
        ];

更新：詹姆斯提出的原始问题在一行中显示了输入文本，如上所示__DATA__。由于问题格式不正确，其他人编辑了问题，将 1 行分成 2 行。只有 James 知道是 1 行还是 2 行。

score 1 · Accepted Answer

我更喜欢@toolic 的答案，因为它很容易处理多个分隔符。

但是，如果您想使事情过于复杂，您可以随时尝试：

#!/usr/bin/perl

use strict; use warnings;

my @contents = ('');

while ( my $line = <DATA> ) {
    last unless $line =~ /\S/;
    $line =~ s{$/}{ };
    if ( $line =~ /^([^|;]+[|;])(.+)$/ ) {
        $contents[-1] .= $1;
        push @contents, $2;
    }
    else {
        $contents[-1] .= $1;
    }
}

print "[$_]\n" for @contents;

__DATA__
Would you; please
hand me| my coat?

score 0 · Accepted Answer

类似的东西

$text = <INPUTFILE>;

@string = split(/[;!]/, $text);

应该或多或少地做到这一点。

编辑：我已将“/;!/”更改为“/[;!]/”。

score 0 · Accepted Answer

$/通过将（输入记录分隔符）设置为竖线，让 Perl 为您完成一半的工作，然后提取分号分隔的字段：

#!/usr/bin/perl

use warnings;
use strict;

my @string;

*ARGV = *DATA;

$/ = "|";
while (<>) {
  s/\n+$//;
  s/\n/ /g;
  push @string => $1 while s/^(.*;)//;
  push @string => $_;
}

for (my $i = 0; $i < @string; ++$i) {
  print "\$string[$i] = '$string[$i]';\n";
}

__DATA__
Would you; please
hand me| my coat?

输出：

$string[0] = '你愿意吗;';
$string[1] = '请递给我|';
$string[2] = '我的外套？';

perl - 使用 Perl，我如何从具有两个可能的记录分隔符的文件中读取记录？

5 回答 5

Related

Reference