regex - 如何删除不以特定子字符串开头或结尾的字符串？

Question

不幸的是，我不是正则表达式专家，所以我需要一点帮助。

我正在寻找解决方案如何 grep 字符串数组以获取两个不以特定子字符串开头（1）或结尾（2）的字符串列表。

假设我们有一个数组，其中的字符串符合以下规则：

[speakerId]-[短语]-[id].txt

IE

10-phraseone-10.txt 11-phraseone-3.txt 1-phraseone-2.txt 2-phraseone-1.txt 3-phraseone-1.txt 4-phraseone-1.txt 5-phraseone-3.txt 6 -phraseone-2.txt 7-phraseone-2.txt 8-phraseone-10.txt 9-phraseone-2.txt 10-phrasetwo-1.txt 11-phrasetwo-1.txt 1-phrasetwo-1.txt 2-短语二-1.txt 3-短语二-1.txt 4-短语二-1.txt 5-短语二-1.txt 6-短语二-3.txt 7-短语二-10.txt 8-短语二-1.txt 9-短语二-1.txt 10-phrasethree-10.txt 11-phrasethree-3.txt 1-phrasethree-1.txt 2-phrasethree-11.txt 3-phrasethree-1.txt 4-phrasethree-3.txt 5-phrasethree- 1.txt 6-phrasethree-3.txt 7-phrasethree-1.txt 8-phrasethree-1.txt 9-phrasethree-1.txt

让我们引入变量：

$speakerId
$phrase
$id1,$id2

我想 grep 一个列表并获取一个数组：

包含特定的元素，$phrase但我们排除同时以特定$speakerId的 AND 以指定的 id 之一结尾的字符串（例如$id1or $id2）
具有特定但最后不包含特定 id 之一的元素$speakerId（$phrase警告：记住不要排除 10 或 11$id=1等）

也许有人可以使用以下代码来编写解决方案：

@AllEntries = readdir(INPUTDIR);

@Result1 = grep(/blablablahere/, @AllEntries);

@Result2 = grep(/anotherblablabla/, @AllEntries);

closedir(INPUTDIR);

score 3 · Accepted Answer

假设一个基本模式来匹配你的例子：

(?:^|\b)(\d+)-(\w+)-(?!1|2)(\d+)\.txt(?:\b|$)

分解为：

(?:^|\b)    # starts with a new line or a word delimeter
(\d+)-      # speakerid and a hyphen
(\w+)-      # phrase and a hyphen
(\d+)       # id
\.txt       # file extension
(?:\b|$)    # end of line or word delimeter

您可以使用否定前瞻断言排除。例如，要包含所有不包含该短语的匹配phrasetwo项，您可以修改上述表达式以使用否定前瞻：

(?:^|\b)(\d+)-(?!phrasetwo)(\w+)-(\d+)\.txt(?:\b|$)

请注意我如何包含(?!phrasetwo). 或者，您可以phrasethree使用后视而不是前瞻来查找所有以偶数结尾的条目：

(?:^|\b)(\d+)-phrasethree-(\d+)(?<![13579])\.txt(?:\b|$)

(?<![13579])只需确保 ID 的最后一个数字落在偶数上。

score 1 · Accepted Answer

我喜欢使用负前瞻和-behinds 的纯正则表达式的方法。但是，它有点难以阅读。也许这样的代码可能更不言自明。它使用在某些情况下像英语一样可读的标准 perl 习语：

my @all_entries      = readdir(...);
my @matching_entries = ();

foreach my $entry (@all_entries) {

    # split file name
    next unless /^(\d+)-(.*?)-(\d+).txt$/;
    my ($sid, $phrase, $id) = ($1, $2, $3);

    # filter
    next unless $sid eq "foo";
    next unless $id == 42 or $phrase eq "bar";
    # more readable filter rules

    # match
    push @matching_entries, $entry;
}

# do something with @matching_entries

如果您真的想在grep列表转换中表达一些复杂的东西，您可以编写如下代码：

my @matching_entries = grep {

    /^(\d)-(.*?)-(\d+).txt$/
    and $1 eq "foo"
    and ($3 == 42 or $phrase eq "bar")
    # and so on

} readdir(...)

score 1 · Accepted Answer

听起来有点像您在描述查询功能。

#!/usr/bin/perl -Tw

use strict;
use warnings;
use Data::Dumper;

my ( $set_a, $set_b ) = query( 2, 'phrasethree', [ 1, 3 ] );

print Dumper( { a => $set_a, b => $set_b } );

# a) fetch elements which
#    1. match $phrase
#    2. exclude $speakerId
#    3. match @ids
# b) fetch elements which
#    1. match $phrase
#    2. match $speakerId
#    3. exclude @ids
sub query {
    my ( $speakerId, $passPhrase, $id_ra ) = @_;

    my %has_id = map { ( $_ => 0 ) } @{$id_ra};

    my ( @a, @b );

    while ( my $filename = glob '*.txt' ) {

        if ( $filename =~ m{\A ( \d+ )-( .+? )-( \d+ ) [.] txt \z}xms ) {

            my ( $_speakerId, $_passPhrase, $_id ) = ( $1, $2, $3 );

            if ( $_passPhrase eq $passPhrase ) {

                if ( $_speakerId ne $speakerId
                    && exists $has_id{$_id} )
                {
                    push @a, $filename;
                }

                if ( $_speakerId eq $speakerId
                    && !exists $has_id{$_id} )
                {
                    push @b, $filename;
                }
            }
        }
    }

    return ( \@a, \@b );
}

regex - 如何删除不以特定子字符串开头或结尾的字符串？

3 回答 3

Related

Reference