php - 使用正则表达式跳过所有字符，直到使用负前瞻找到特定的字母序列

Question

我对基本的正则表达式没问题，但是我对 pos/neg 向前/向后看有点迷茫。

我正在尝试从中提取 id #：

[关键字东西=其他东西id=123 morestuff=东西]

之前或之后可能有无限数量的“东西”。我一直在使用 The Regex Coach 来帮助调试我尝试过的内容，但我不再继续前进了......

到目前为止，我有这个：

\[keyword (?:id=([0-9]+))?[^\]]*\]

它会处理 id 之后的任何额外属性，但我不知道如何忽略关键字和 id 之间的所有内容。我知道我不能去[^id]* 我相信我需要像这样使用负前瞻，(?!id)*但我想因为它是零宽度，它不会从那里向前移动。这也不起作用：

\[keyword[A-z0-9 =]*(?!id)(?:id=([0-9]+))?[^\]]*\]

我一直在寻找示例，但没有找到任何示例。或者也许我有，但他们太过分了，我什至没有意识到他们是什么。

帮助！谢谢。

编辑：它也必须匹配 [keyword stuff=otherstuff]，其中 id= 根本不存在，所以我必须在 id # 组上有一个 1 或 0。还有其他 [otherkeywords id=32] 我不想匹配。文档需要使用 preg_match_all 在整个文档中匹配多个 [keyword id=3]。

score 2 · Accepted Answer

无需前瞻/后视：

/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/

添加了结尾 '[^]]*]' 来检查真正的标签结尾，这可能是不必要的。

编辑：将 \b 添加到 id 否则它可以匹配[keyword you-dont-want-this-guid=123123-132123-123 id=123]

$ php -r 'preg_match_all("/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/","[keyword stuff=otherstuff morestuff=stuff]",$matches);var_dump($matches);'
array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(42) "[keyword stuff=otherstuff morestuff=stuff]"
  }
  [1]=>
  array(1) {
    [0]=>
    string(0) ""
  }
}
$ php -r 'var_dump(preg_match_all("/\[keyword(?:[^\]]*?\bid=([0-9]+))?[^\]]*?\]/","[keyword stuff=otherstuff id=123 morestuff=stuff]",$matches),$matches);'
int(1)
array(2) {
  [0]=>
  array(1) {
    [0]=>
    string(49) "[keyword stuff=otherstuff id=123 morestuff=stuff]"
  }
  [1]=>
  array(1) {
    [0]=>
    string(3) "123"
  }
}

score 2 · Accepted Answer

你不需要向前/向后看。

由于问题标记为 PHP，因此使用preg_match_all()并将匹配项存储在 $matches 中。

就是这样：

<?php

  // Store the string. I single quote, in case there are backslashes I
  // didn't see.
$string = 'blah blah[keyword stuff=otherstuff id=123 morestuff=stuff]
           blah blah[otherkeyword stuff=otherstuff id=555 morestuff=stuff]
           blah blah[keyword stuff=otherstuff id=444 morestuff=stuff]';

  // The pattern is '[keyword' followed by not ']' a space and id
  // The space before id is important, so you don't catch 'guid', etc.
  // If '[keyword'  is always at the beginning of a line, you can use
  // '^\[keyword'
$pattern = '/\[keyword[^\]]* id=([0-9]+)/';

  // Find every single $pattern in $string and store it in $matches
preg_match_all($pattern, $string, $matches);

  // The only tricky part you have to know is that each entire match is stored in
  // $matches[0][x], and the part of the match in the parentheses, which is what
  // you want is stored in $matches[1][x]. The brackets are optional, since it's
  // only one line.
foreach($matches[1] as $value)
{     
    echo $value . "<br/>";
}
?>

输出：

123
444

（应该跳过 555）

附言

如果可以有一个选项卡，您也可以使用\b而不是文字空间。\b表示一个单词边界......在这种情况下是一个单词的开头。

$pattern = '/\[keyword[^\]]*\bid=([0-9]+)/';

score 0 · Accepted Answer

我认为这就是你要得到的：

\[keyword(?:\s+(?!id\b)[A-Za-z]+=[^\]\s]+)*(?:\s+id=([0-9]+))?[^\]]*\]

（我假设属性名称只能包含 ASCII 字母，而值可以包含除之外的任何非空白字符]。）

(?:\s+(?!id\b)[A-Za-z]+=[^\]\s]+)*匹配任意数量的attribute=value对（以及它们前面的空格），只要属性名称不是id. （\b单词边界）以防万一有以开头的属性名称id，例如idiocy. 这次没有必要在属性名称\b 前加上 a，因为您知道它匹配的任何名称前面都会有空格。但是，正如您所了解的，在这种情况下，前瞻方法过于矫枉过正。

现在，关于这个：

[A-z0-9 =]

那A-z要么是错字，要么是错误。如果您希望它匹配所有大写和小写字母，那么它确实如此。但它也匹配

'[', ']', '^', '_', '`` and '\'

...因为它们的代码点位于大写字母和小写字母之间。ASCII 字母，即。

php - 使用正则表达式跳过所有字符，直到使用负前瞻找到特定的字母序列

3 回答 3

Related

Reference