regex - 将使用正则表达式从文件中读取的唯一元素推送到 array-Perl

Question

这是我的文件：

  heaven
  heavenly
  heavenns
  abc
  heavenns
  heavennly

根据我的代码，只有heavennsandheavennly应该被推入@myarr，并且它们应该只在数组中一次。怎么做？

my $regx = "heavenn\+";
my $tmp=$regx;

$tmp=~ s/[\\]//g;

$regx=$tmp;
print("\nNow regex:", $regx);

my $file  = "myfilename.txt";

my @myarr;
open my $fh, "<", $file;  
while ( my $line = <$fh> ) {
 if ($line =~ /$regx/){
    print $line;
push (@myarr,$line);
}
}

print ("\nMylist:", @myarr); #printing 2 times heavenns and heavennly

score 1 · Accepted Answer

对于中的给定值$_，!$seen{$_}++仅在第一次执行时为真。

my $regx = qr/heavenn/;

my @matches;
my %seen;
while (<>) {
   chomp;
   push(@mymatches, $_) if /$regx/ && !$seen{$_}++;
}

score 1 · Accepted Answer

这是 Perl，所以有不止一种方法可以做到 (TMTOWTDI)。这是其中之一：

#!/usr/bin/env perl
use strict;
use warnings;

my $regex = "heavenn+";
my $rx = qr/$regex/;
print "Regex: $regex\n";

my $file  = "myfilename.txt";
my %list;
my @myarr;
open my $fh, "<", $file or die "Failed to open $file: $?";

while ( my $line = <$fh> )
{
    if ($line =~ $rx)
    {
        print $line;
        $list{$line}++;
    }
}

push @myarr, sort keys %list;

print "Mylist: @myarr\n";

样本输出：

Regex: heavenn+
heavenns
heavenns
heavennly
Mylist: heavennly
 heavenns

排序不是必需的（但它以合理的顺序显示数据）。当计数为 0 时，您可以将项目添加到数组中。$list{$line}您可以选择输入行以删除换行符。等等。

如果我只想推送特定的单词怎么办。例如，如果我的文件是，1.“heavenns hello” 2.“heavenns hi”，“3.heavennly good”。只打印“heavenns”和“heavennly”怎么办？

然后你必须安排只捕获这个词。这意味着改进正则表达式。假设您想要heavenn在单词的开头并且不介意之后出现什么字母字符，那么：

#!/usr/bin/env perl
use strict;
use warnings;

my $regex = '\b(heavenn[A-Za-z]*)\b';  # Single quotes necessary!
my $rx = qr/$regex/;
print "Regex: $regex\n";

my $file  = "myfilename.txt";
my %list;
my @myarr;
open my $fh, "<", $file or die "Failed to open $file: $?";

while ( my $line = <$fh> )
{
    if ($line =~ $rx)
    {
        print $line;
        $list{$1}++;
    }
}

push @myarr, sort keys %list;

print "Mylist: @myarr\n";

数据文件：

1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
heaven
heavenly
heavenns
abc
heavenns
heavennly

输出：

Regex: \b(heavenn[A-Za-z]*)\b
1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
heavenns
heavenns
heavennly
Mylist: heavennly heavenns

请注意，列表中的名称不再包含换行符。

聊天后

这个版本从命令行获取一个正则表达式。脚本调用是：

perl script.pl -p 'regex' [file ...]

如果在命令行上没有指定文件，它将从标准输入中读取（比使用固定的输入文件名要好 — 很大一部分）。它在每一行上查找指定正则表达式的多次出现，其中正则表达式可以在\w.

#!/usr/bin/env perl
use strict;
use warnings;
use Getopt::Std;

my %opts;
getopts('p:', \%opts) or die "Usage: $0 [-p 'regex']\n";

my $regex_base = 'heavenn';
#$regex_base = $ARGV[0] if defined $ARGV[0];
$regex_base = $opts{p} if defined $opts{p};

my $regex = '\b(\w*' . ${regex_base} . '\w*)\b';
my $rx = qr/$regex/;
print "Regex: $regex (compiled form: $rx)\n";

my %list;
my @myarr;

while (my $line = <>)
{
    while ($line =~ m/$rx/g)
    {
        print $line;
        $list{$1}++;
        #$line =~ s///;
    }
}

push @myarr, sort keys %list;

print "Matched words: @myarr\n";

给定输入文件：

1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
An unheavenly host.  Good heavens! It heaves to like a yacht!
heaven
Is it heavens
heavenly
heavenns
abc
heavenns
heavennly

您可以获得以下输出：

$ perl script.pl -p 'e\w*?ly' myfilename.txt
Regex: \b(\w*e\w*?ly\w*)\b (compiled form: (?^:\b(\w*e\w*?ly\w*)\b))
"3.heavennly good". What to d
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
An unheavenly host.  Good heavens! It heaves to like a yacht!
heavenly
heavennly
Matched words: equally heavenly heavennly heavennnly heavennnnly unheavenly
$ perl script.pl myfilename.txt
Regex: \b(\w*heavenn\w*)\b (compiled form: (?^:\b(\w*heavenn\w*)\b))
1. "heavenns hello"
2. "heavenns hi",
"3.heavennly good". What to d
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
Good heavennsy! What a heavennnly output from an equally heavennnnly input!
heavenns
heavenns
heavennly
Matched words: heavennly heavennnly heavennnnly heavenns heavennsy
$

score 0 · Accepted Answer

如果您只想推送第一次出现的单词，您可以在循环中的正则表达式之后添加以下内容：

# Assumes "my %seen;" is declared outside the loop.
next if $seen{$line}++;

更多唯一性方法：如何在 Perl 数组中打印唯一元素？

regex - 将使用正则表达式从文件中读取的唯一元素推送到 array-Perl

3 回答 3

聊天后

Related

Reference