php - 匹配不区分大小写的精确短语与空格

Question

如果我有一个字符串"Hello I went to the store today"并且我有一个匹配数组

$perfectMatches = array("i went","store today");

它应该匹配这两个。（数组可能会变得很大，所以我更喜欢在 1 个 preg_match 中完成）

编辑：得到这个工作！谢谢！

preg_match_all("/\b(" . implode($perfectMatches,"|") . ")\b/i", $string, $match1)

我还需要一个很难解释的单独的正则表达式。说我有一个数组

$array = array("birthday party","ice cream");//this can be very long

如果“生日”和“派对”以及字符串中的任何位置，是否可以获得匹配字符串的正则表达式？

所以它应该匹配“嗨，今天是我的生日，我要去参加一个派对”？但是在 1 preg_match 中也有“冰淇淋”？

谢谢

编辑：示例...

用户提交了一个项目的描述，我想检查垃圾邮件。我知道大多数垃圾邮件帖子都有“个人支票”或“特价”之类的短语，所以我想获取所有这些短语的列表并与描述进行检查。如果描述中有我列表中的任何短语，它将被标记为垃圾邮件。这种情况适用于我想要的第一个正则表达式。

第二个正则表达式是如果我知道某些垃圾邮件帖子在某处有“减肥”“体重”“快速”等词，不必按任何顺序排列，但这三个词在描述中。因此，如果我得到这些短语“快速减肥”、“需要信用卡”的列表并与描述进行核对，我可以将其标记为垃圾邮件

score 1 · Accepted Answer

听起来您的问题的第 1 部分已经解决，因此此答案仅关注第 2 部分。据我了解，您正在尝试确定给定的输入消息是否包含任何顺序的所有单词列表。

这可以通过正则表达式和preg_match每条消息的单个来完成，但是如果您有大量单词列表，则效率非常低。如果 N 是您要搜索的单词数，M 是消息的长度，那么算法应该是 O(N*M)。如果您注意到，每个关键字的正则表达式中有两个 .*术语。使用前瞻断言，正则表达式引擎必须为每个关键字遍历一次。这是示例代码：

<?php

// sample messages
$msg1 = "Lose all the weight all the weight you want.  It's fast and easy!";
$msg2 = 'Are you over weight? lose the pounds fast!';
$msg3 = 'Lose weight slowly by working really hard!';

// spam defining keywords (all required, but any order).
$keywords = array('lose', 'weight', 'fast');

//build the regex pattern using the array of keywords
$patt = '/(?=.*\b'. implode($keywords, '\b.*)(?=.*\b') . '\b.*)/is';

echo "The pattern is: '" .$patt. "'\n";
echo 'msg1 '. (preg_match($patt, $msg1) ? 'is' : 'is not') ." spam\n";
echo 'msg2 '. (preg_match($patt, $msg2) ? 'is' : 'is not') ." spam\n";
echo 'msg3 '. (preg_match($patt, $msg3) ? 'is' : 'is not') ." spam\n";
?>

输出是：

The pattern is: '/(?=.*\blose\b.*)(?=.*\bweight\b.*)(?=.*\bfast\b.*)/is'
msg1 is spam
msg2 is spam
msg3 is not spam

第二种解决方案似乎更复杂，因为代码更多，但正则表达式要简单得多。它没有前瞻断言，也没有.*条款。该preg_match函数在while循环中被调用，但这并不是什么大问题。每条消息只遍历一次，复杂度应该是 O(M)。这也可以使用单个preg_match_all函数来完成，但是您必须执行 anarray_search才能获得最终计数。

<?php

// sample messages
$msg1 = "Lose all the weight all the weight you want.  It's fast and easy!";
$msg2 = 'Are you over weight? lose the pounds fast!';
$msg3 = 'Lose weight slowly by working really hard!';

// spam defining keywords (all required, but any order).
$keywords = array('lose', 'weight', 'fast');

//build the regex pattern using the array of keywords
$patt = '/(\b'. implode($keywords,'\b|\b') .'\b)/is';

echo "The pattern is: '" .$patt. "'\n";
echo 'msg1 '. (matchall($patt, $msg1, $keywords) ? 'is' : 'is not') ." spam\n";
echo 'msg2 '. (matchall($patt, $msg2, $keywords) ? 'is' : 'is not') ." spam\n";
echo 'msg3 '. (matchall($patt, $msg3, $keywords) ? 'is' : 'is not') ." spam\n";

function matchall($patt, $msg, $keywords)
{
  $offset = 0;
  $matches = array();
  $index = array_fill_keys($keywords, 0);
  while( preg_match($patt, $msg, &$matches, PREG_OFFSET_CAPTURE, $offset) ) {
    $offset = $matches[1][1] + strlen($matches[1][0]);
    $index[strtolower($matches[1][0])] += 1;
  }
  return min($index);
}
?>

输出是：

The pattern is: '/(\blose\b|\bweight\b|\bfast\b)/is'
msg1 is spam
msg2 is spam
msg3 is not spam

php - 匹配不区分大小写的精确短语与空格

1 回答 1

Related

Reference