regex - 使用正则表达式从 perl 中提取单词

Question

我有这个文件：

1. heavenns 2 heavenns 3 heavenns good 4 heavenns 5heavennly bye

从这一行开始，只有'heavenns'and'heavennly'应该打印一次。

我这段代码是我在其他线程中提出的另一个问题。我想我已经在那里接受了我的问题，现在没有人会看到它，对吧？（我是新来的，不知道怎么用？）

#!/usr/bin/env perl
use strict;
use warnings;

my $regex = "heavenn+";
my $rx = qr/$regex/;
print "Regex: $regex\n";

my $file  = "myfilename.txt";
my %list;
my @myarr;
open my $fh, "<", $file or die "Failed to open $file: $?";

while ( my $line = <$fh> ) {
    if ($line =~ $rx)
    {
        print $line;
        $list{$line}++;
    }
 }

 push @myarr, sort keys %list;

 print "Mylist: @myarr\n"; #NOT GIVING ME UNIQUE VALUES & NOW I ONLY WANT heavenns and heavennly

score 1 · Accepted Answer

1

perl -0777 -nE'@w{m/(heavenn\w+)/g}=();say for keys %w'

于 2013-07-22T07:36:16.680 回答

score 0 · Accepted Answer

您没有正确使用哈希。

检查哈希中是否存在单词。
如果不存在，请输入。如果是，请跳过。
循环后，打印出哈希内容。无需使用数组。

score 0 · Accepted Answer

当您在列表上下文中使用正则表达式时，您将获得所有匹配项。您遇到的另一个问题是正则表达式本身。当你使用a+时，它意味着将使用加号前面的单词。你需要一只野猫。这是.. 所以你的正则表达式必须像heavenn.. 例如你的问题：

my $regex = "heavenn.";

my $file = "myfilename.txt";
my %list;
my @myarr;

#open my $fh, "<", $file or die "Failed to open $file: $?";

while ( my $line = <DATA> ) {
  my @founds = $line =~ m/$regex/g;
  foreach my $found ( @founds ) {
    print $found . "\n";
    $list{$found}++;
  }
}

push @myarr, sort keys %list;

print "Mylist: @myarr\n";

__DATA__

1. heavenns 2 heavenns 3 heavenns good 4 heavenns 5heavennly bye

我在这里使用这种方式将所有匹配项作为一个数组获取，并循环遍历找到的内容以仅找到 1 个唯一的找到（如您）。

score 0 · Accepted Answer

当听起来您想要打印出匹配的单词时，您正在打印出整行。如果是这种情况，那么您需要做的第一件事就是更改您的正则表达式：

my $rx = qr/heavenn.*?\b/

这匹配“heavenn”加上直到下一个单词边界的任何字符。从您的问题中很难判断这是否是您需要的确切正则表达式，但它会匹配“heavenns”和“heavennly”，所以我坚持这一点。如果这不是您想要的，您可能需要稍作更改以满足您的需要。

接下来，只需稍微更改您的 while 循环以将匹配的单词提取到哈希中。你可以这样做：

while (my $line = <$fh>) {
    $list{$_}++ for $line =~ /$rx/g;
}

say for sort keys %list;   #Need to 'use feature qw(say);'
# => prints "heavennly\heavenns\n"

regex - 使用正则表达式从 perl 中提取单词

4 回答 4

Related

Reference