php - 这个正则表达式的否定断言是什么？

Question

我正在尝试匹配以下内容：

这：

HIGH SCHOOL WRESTLING NOTEBOOK: A surge at Delaware Valley, team rankings shakeup and more.

有了这个：

<pre>
  <div class="sum">
    <div class="photo_gutter">
      <div class="photo">
        <a href="http://media.lehighvalleylive.com/brad-wilson/photo/jaryd-flank-b30e919c41bc86b2.jpg">
          <img src="http://media.lehighvalleylive.com/brad-wilson/photo/jaryd-flank-b30e919c41bc86b2.jpg" alt="" title="" width="200" border="0"/>
        </a>
      </div>
    </div>
  </div>
  HIGH SCHOOL WRESTLING NOTEBOOK: A surge at Delaware Valley, team rankings shakeup and more.
</pre>

到目前为止我所拥有的是/<.*>\s/i，但我需要相反的。有人能帮我吗？

score 2 · Accepted Answer

2

不要使用正则表达式来解析 HTML，而是使用PHP Domdocument。

于 2013-02-04T09:51:25.373 回答

score 0 · Accepted Answer

不建议使用正则表达式来解析 HTML，但因为它是一项简单的任务（并且可能意味着学习正则表达式）：

你有这个：/<.*>\s/i

1-i修饰符在这里什么都不做，因为您没有在正则表达式中使用任何可能区分大小写的字符。即：/apple/i有道理，因为你想找到Apple. /\w+/i什么都不做，因为\w包括小写和大写字符。

2-如果您正在解析 HTML，最好不要假设或使用任何\s内容，除非您在标签内。

3-如果要将正则表达式的一部分捕获到变量中，则必须使用(and )。即：/(\w+) Apple/解析Red Apple会给你或Red在函数$1的匹配数组中preg_match()。

现在我该怎么做：

首先，我会从输入字符串中删除任何\r\nor 。\n正则表达式仅使用 1 行文本效果更好。你可以这样做str_replace()

如果你想得到任何不在里面的东西<>：

/>(.*?)</

如果您想获取某个标签内的文本，例如<div>this one</div>：

/<div>(.*?)<\/div>/

The ? character makes the .* match to be non-greedy, so It will get the least number of characters that match the pattern.

Hope it helped.

php - 这个正则表达式的否定断言是什么？

2 回答 2

Related

Reference