0

I am working with a text file that contains data in a format like so:

To Kill A Mocking Bird|Harper Lee|S1|4A
Life of Pi|Yann Martel|S3|5B
Hunger Games|Suzzanne Collins|S2|2C

The actual data file has many more entries, and there are more than 3 instances of S1.

I am writing a program in Perl to compare the data in this file with another file, mainly the filing information like S1, 4A.

I approached this by first storing the data from the file into a string. I then split the string by using pipe | as a delimiter and stored it into an array. I then used a foreach loop to iterate through the array to find matching information.

Note that all files are in the same directory.

#!/usr/bin/perl

open(INFO, "psychnet3.data");
my $dbinfo = <INFO>;
close(INFO);

@dbarray = split("|", $dbinfo);
$index_counter = 0;

foreach $element (@dbarray) {

  if ($element =~ "S1") {
    open(INFO, ">>logfile.txt");
    print INFO "found a S1";
    close(INFO);

    if ($dbarray[$index_counter + 1] =~ "4A") {
      $counter++;
      open(INFO, ">>logfile.txt");
      print INFO "found S1 4A";
      close(INFO);
    }
  }
  $index_counter++;
}

In the output file, it does not find all instances of S1.

I also tried using eq as a conditional instead of =~ and still no luck.

I am new to Perl, coming from C#, is there any syntax I'm making a mistake with, or is it a logic error?

4

2 回答 2

1

有很多方法可以做到这一点,其中一些包括正则表达式,而另一些则不包括。如果您寻找的字段是文件的唯一第 3 和第 4 字段,并且您的文件具有标准结构,那么可以这样完成

编辑:

该文件不是那么一致,因此请改用正则表达式。

还删除了 @dbinfo 数组。这不是必需的,内存不是免费的:)

(记得更改文件句柄的名称,避免与同名的内部循环文件句柄冲突)

open(MINFO, "psychnet3.data");
while (my $line = <MINFO>) {
    if ( $line =~ m/\|S1/i ) {
        open(INFO, ">>logfile.txt");
        print INFO "found a S1";
        close(INFO);

        $line =~ m/\|4A/i
          $counter++;
          open(INFO, ">>logfile.txt");
          print INFO "found S1 4A";
          close(INFO);
        }
    }
}
close(<MINFO);
于 2013-11-07T20:22:51.513 回答
0

你没有提到你如何比较这些数据。这是由书名完成的吗?还是作者自己做的?这使得确切地知道这些信息需要如何存储变得有点困难。

您的数据比存储单个数据要复杂一些。这意味着默认的 Perl 数据结构、标量 ( $foo)、数组 ( @foo) 和散列 ( %foo) 根本不会削减它。是时候学习参考了。

从技术上讲,引用是内存中存储一​​些其他项目的位置。您可以通过在名称前面放置反斜杠来创建引用:

$ref_to_foo_array = \@foo;

这是存储我的数组$ref_to_foo_array的内存位置。@foo最大的优势是不是指整个值数组,而是指单个值:内存中@foo存储的位置。这意味着我可以将该信息放入数组或哈希中:

$bar[0] = $ref_to_foo_array;
$bar[1] = $ref_to_some_other_array;

现在,@bar不仅仅是存储两个值。相反,@bar将信息存储在两个数组中!我有一个数组数组!.

为了取回我的原始数组,我只需将正确的印记放在我的引用前面来取消引用它:

@foo = @{ $bar[0] };

为了使事情变得更容易,我可以使用->作为取消引用单个值的方法:

$array_reference = $bar[0];
$array_reference->[0];   # First item in the array being referenced
$array_reference->[1];   # Second item

当然,我也可以这样做:

$bar[0]->[0] # First item in the array being referenced

那么这一切有什么作用呢?手表:

use strict;
use warnings;
use autodie;
use feature qw(say);

use constant {
    BOOK_FILE  => 'psychnet3.data',
};

open my $book_fh, "<", BOOK_FILE;

my %book_hash;
for my $book ( <$book_fh> ) {
    chomp $book;
    my ( $title, $author, $section, $shelf ) = split /\s*\|\s*/, $book;

    my $temp_book_hash;
    $temp_book_hash{AUTHOR} = $author;
    $temp_book_hash{SECTION} = $section;
    $temp_book_hash{SHELF} = $shelf;

    $book_hash{$title} = \$temp_book_hash;
}

我有一个%temp_book_hash以书名为主的书。然而,这个单一的散列存储了该书所在位置的作者、章节和自我。每本书都有与之相关的三种不同的信息位,但我能够将所有这些信息存储在一个数据结构中。无需保留并行数组或散列。

我如何获得这些信息?简单的:

my $title = "To Kill a Mockingbird";
my %temp_book_hash = %{ $book_hash{$title} };
say "The book $title was written by $temp_book_hash{AUTHOR}";

通过取消引用我存储在 中的哈希$book_hash{$title},我可以提取作者的姓名和归档信息。

语法有点笨拙。我不断地制作临时变量来来回传递信息。幸运的是,Perl 允许我跳过这一步。这是与以前相同的循环:

for my $book ( <$book_fh> ) {
    chomp $book;
    my ( $title, $author, $section, $shelf ) = split /\s*\|\s*/, $book;

    $book_hash{$title} = {};   # Line not necessary

    $book_hash{$title}->{AUTHOR}  = $author;
    $book_hash{$title}->{SHELF}   = $shelf;
    $book_hash{$title}->{SECTION} = $section;
}

我可以将日期直接存储到我最外层的哈希中,而不是使用那个临时哈希。这种语法更短更简洁。而且,更容易理解。

该行$book_hash{$title} = {};声明$book_hash{$title}将存储哈希引用而不是一些标准字符串或数字。这条线根本没有必要。Perl 会发现你正在使用$book_hash{$title}->{AUTHOR} = $author;. 但是,我喜欢声明我在该变量中存储引用的意图。这样,如果在我的程序中更进一步$book_hash{$title} = $author;,另一个开发人员会认识到我犯了一个错误。

我可以使用相同->的符号从我的书中提取信息,而不必创建临时变量:

my $title = "To Kill a Mockingbird";
say "The book $title was written by " . $book_hash{$title}->{AUTHOR};

您提到您正在比较两个文件。想象一下,我将第一个存储%book_hash$book_hash2. 我可以循环浏览我的书,看看哪些曾经被错误地搁置。

for my $title ( keys %book_hash ) {
    if ( $book_hash{$title}->{SHELF} ne $book_hash2{$title}->{SHELF} ) {
       say "The book $title is stored on two different shelves!"
    }
    else {
       say "The book $title is on the correct shelf";
    }
}

参考文献有点难以理解,但我希望你能看到能够将所有关于你的书的信息存储在一个数据结构中的能力。

于 2013-11-07T21:33:51.403 回答