perl - 提取在perl中以管道分隔的特定多行记录

Question

我有一个看起来像的文件

姓名|约翰|东京|日本
年龄|32|M
信息|单身|专业|IT
名称|标记|马尼拉|PH
年龄|37|M
信息|已婚|专业|业务流程外包
姓名|萨曼莎|悉尼|澳大利亚
年龄|37|F
信息|已婚|专业|离岸
姓名|卢克|东京|日本
年龄|27|M
信息|单身|专业|IT

我想按国家/地区分开记录。我已将每一行存储到数组变量中@fields

my @fields = split(/\|/, $_ );

$fields[3]作为我排序的依据。我希望它分成 2 个输出文本文件

输出文本文件 1：

NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT

输出文本文件 2

NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE

将来自 JPN 的所有内容输出文本 1 和非日本国家/地区输出文本文件 2

这是试图解决的代码

use strict;
use warnings;
use Data::Dumper;
use Carp qw(croak);

my @fields;
my $tmp_var;
my $count;
;
my ($line, $i);

my $filename = 'data.txt';
open(my $input_fh, '<', $filename ) or croak "Can't open $filename: $!";


open(OUTPUTA, ">", 'JPN.txt') or die "wsl_reformat.pl: could not open $ARGV[0]";
open(OUTPUTB, ">", 'Non-JPN.txt') or die "wsl_reformat.pl: could not open $ARGV[0]";

my $fh;
while (<$input_fh>) {

    chomp;
   my @fields = split /\|/;


   if ($fields[0] eq 'NAME') {
    for ($i=1; $i < @fields; $i++) {
        if ($fields[3] eq 'JPN') {
           $fh = $_;
            print OUTPUTA $fh;
        }
        else {
           $fh = $_;
            print OUTPUTB $fh;
        }
    }

}   
}

close(OUTPUTA);
close(OUTPUTB)

仍然没有运气:(

score 1 · Accepted Answer

你没有说你需要什么帮助，所以我假设它正在提出一个算法。这是一个很好的：

打开要读取的文件。
打开 JPN 条目的文件。
打开非 JPN 条目的文件。
虽然不是 eof，
1. 读一行。
2. 解析行。
3. 如果它是记录的第一行，
  1. 如果此人的国家是 JPN，
    1. 将当前文件句柄设置为 JPN 条目的文件句柄。
  2. 别的，
    1. 将当前文件句柄设置为非 JPN 条目的文件句柄。
4. 将该行打印到当前文件句柄。

my $jpn_qfn   = '...';
my $other_qfn = '...';

open(my $jpn_fh,   '>', $jpn_qfn)
   or die("Can't create $jpn_qfn: $!\n");
open(my $other_fh, '>', $other_qfn)
   or die("Can't create $other_qfn: $!\n");

my $fh;
while (<>) {
   chomp;
   my @fields = split /\|/;
   if ($fields[0] eq 'NAME') {
      $fh = $fields[3] eq 'JPN' ? $jpn_fh : $other_fh;
   }

   say $fh $_;
}

score 1 · Accepted Answer

这是我认为 ikegami 所说的方式，但我以前从未尝试过（尽管它给出了正确的结果）。

#!/usr/bin/perl
use strict;
use warnings;

open my $jpn_fh, ">", 'o33.txt' or die $!;
open my $other_fh, ">", 'o44.txt' or die $!;

my $fh;
while (<DATA>) {
    if (/^NAME/) {
        if (/JPN$/) {
            $fh = $jpn_fh;  
        }
        else {
            $fh = $other_fh;
        }
    }
    print $fh $_;
}   

close $jpn_fh or die $!;
close $other_fh or die $!;

__DATA__
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT

score 0 · Accepted Answer

#!/usr/bin/env perl

use 5.012;
use autodie;
use strict;
use warnings;

# store per country output filehandles
my %output;

# since this is just an example, read from __DATA__ section

while (my $line = <DATA>) {
    # split the fields
    my @cells = split /[|]/, $line;

    # if first field is NAME, this is a new record
    if ($cells[0] eq 'NAME') {
        # get the country code, strip trailing whitespace
        (my $country = $cells[3]) =~ s/\s+\z//;

        # if we haven't created and output file for this
        # country, yet, do so
        unless (defined $output{$country}) {
            open my $fh, '>', "$country.out";
            $output{$country} = $fh;
        }
        my $out = $output{$country};

        # output this and the next two lines to
        # country specific output file
        print $out $line, scalar <DATA>, scalar <DATA>;
    }
}

close $_ for values %output;

__DATA__
NAME|JOHN|TOKYO|JPN
AGE|32|M
INFO|SINGLE|PROFESSIONAL|IT
NAME|MARK|MANILA|PH
AGE|37|M
INFO|MARRIED|PROFESSIONAL|BPO
NAME|SAMANTHA|SYDNEY|AUS
AGE|37|F
INFO|MARRIED|PROFESSIONAL|OFFSHORE
NAME|LUKE|TOKYO|JPN
AGE|27|M
INFO|SINGLE|PROFESSIONAL|IT

score 0 · Accepted Answer

感谢您的帮助堆我能够在perl中解决这个问题，非常感谢

#!/usr/local/bin/perl

use strict;
use warnings;
use Data::Dumper;
use Carp qw(croak);

my @fields;
my $tmp_var;
my ($rec_type, $country);

my $filename = 'data.txt';


open (my $input_fh, '<', $filename ) or croak "Can't open $filename: $!";


open  my $OUTPUTA, ">", 'o33.txt' or die $!;
open  my $OUTPUTB, ">", 'o44.txt' or die $!;

my $Combline;
while (<$input_fh>) {

    $_ = _trim($_); 
    @fields = split (/\|/, $_); 
    $rec_type = $fields[0];
    $country = $fields[3];

        if ($rec_type eq 'NAME') {          
            if ($country eq 'JPN') {                            
                *Combline = $OUTPUTA;
            }           
            else {                              
                *Combline = $OUTPUTB;
            }
        }       
   print  Combline;
}   

close $OUTPUTA or die $!;
close $OUTPUTB or die $!;

sub _trim {
    my $word = shift;
    if ( $word ) {      
        $word =~ s/\s*\|/\|/g;      #remove trailing spaces
        $word =~ s/"//g;        #remove double quotes
    }
    return $word;
}

perl - 提取在perl中以管道分隔的特定多行记录

4 回答 4

Related

Reference