2
aardvark is an animal with aardvark
aardvark is an animal with aardvark along with another aardvark
aardvark is an animal with an elephant that loves an aardvark that lives in downtown

或者

aardvark is an animal with aardvark. aardvark is an animal with aardvark along with another aardvark. aardvark is an animal with an elephant that loves an aardvark that lives in downtown

这是我必须仅提取仅aardvark出现两次的句子的文本。

我尝试了这个表达((.*?)(aardvark)(.*?)(aardvark)(.*?)[\.\n])(.*\baardvark\b.*){2}但我得到了所有的句子作为答案。

我应该如何处理它?

4

4 回答 4

2

尝试这个:

^(((?!\baardvark\b)\b\w+\b\s+)*?\baardvark\b\s*((?!\baardvark\b)\b\w+\b\s+)*?){2}$
于 2013-05-12T12:21:23.050 回答
1

你可以试试这个:

<pre>
<?php
$subject = <<<LOD
aardvark is an animal with aardvark
aardvark is an animal with aardvark along with another aardvark
aardvark is an animal with an elephant that loves an aardvark that lives in downtown
LOD;

$pattern = <<<'LOD'
~
(?(DEFINE) # the word
    (?<tw> \b aardvark \b ) )

(?(DEFINE) # other word
    (?<ow> \b (?!\g<tw>)[a-z]++ \b ) )

(?(DEFINE) # not a word 
    (?<nw>[^a-z]++) )

(?(DEFINE) # not the word
    (?<ntw> (?> \g<ow> | \g<nw> )++ ) )

# pattern :    
    ^ \g<ntw>? \g<tw> \g<ntw> \g<tw> \g<ntw>? $ 
~xim
LOD;
/* a more condensed version */
$pattern = <<<'LOD'
~
    ^ (?<ntw> (?> \b(?!\g<tw>)[a-z]++\b | [^a-z]++ )++ )?
      (?<tw> \b aardvark \b )
      \g<ntw> \g<tw> \g<ntw>? $
~xim
LOD;

preg_match_all($pattern, $subject, $matches);

print_r($matches[0]);

请注意,您可以将“ow”组替换为以(?<ow> \b (?> [b-z] | (?!\g<tw>)a ) [a-z]*+ \b ) )获得更好的性能,但请记住,对于不以字母 a 开头的单词,您必须更改字母和第一类。“考拉”的例子:

(?<ow> \b (?> [a-jl-z] | (?!\g<tw>)k ) [a-z]*+ \b ) )
于 2013-05-12T13:02:24.197 回答
1

如果您只是在寻找带有简单(静态)单词的句子,则根本不需要使用正则表达式。

$words = explode(' ', $sentence); # or preg_split, if you want to split on space, tab, hyphen, etc.
$counts = array_count_values($words);
if($count['aardvark'] == 2) {
  // found!
} else {
  // not interested
}
于 2013-05-12T10:04:16.203 回答
1

如果你真的想使用正则表达式:

<?php

$data = 'aardvark aardvark aardvark aardvark
aardvark is an animal with aardvark
aardvark is an animal with aardvark along with another aardvark
aardvark is an animal with an elephant that loves an aardvark that lives in downtown';

preg_match_all("@(^|[\.\n])((?:(?!aardvark).)*(aardvark)(?:(?!aardvark).)*(aardvark)(?:(?!aardvark).)*)([\.\n]|$)@sU", ($data), $matches, PREG_SET_ORDER);

foreach($matches as $match)
    echo $match[2] . '<br />';
于 2013-05-12T10:14:07.387 回答