php - PHP正则表达式：获取所有带问号的网址

Question

我有这个正则表达式：

preg_match_all("/<a\s.*?href\s*=\s*['|\"](.*?)(?=#|\"|')/si", $data, $matches);

查找所有网址，它工作正常，但我怎样才能修改它以查找仅带有问号的网址？

例子：

<a href="http://site.com/index.php">0</a><a href="http://site.com/index.php?id=1">1</a><a href="http://site.com/calc/index.php?id=1&scheme=Venus">2</a><a href="http://site.com/catalogue/data.php">3</a>

并将preg_match_all返回：

http://site.com/index.php?id=1

http://site.com/calc/index.php?id=1&scheme=Venus

score 1 · Accepted Answer

1

preg_match_all("@<a\s*href\s*=[\'\"]([^\'\"]+\?[^\'\"]+)[\'\"]@si", $data, $matches);

Try this.

于 2013-06-15T06:53:17.650 回答

score 0 · Accepted Answer

不要试图让所有事情都发生在一个正则表达式中。使用您现有的方法，然后单独检查您返回的 URL 是否有问号。

也就是说，不要使用正则表达式来解析 HTML。您无法使用正则表达式可靠地解析 HTML，并且您将面临悲伤和挫败感。一旦 HTML 与您的期望发生变化，您的代码就会被破坏。有关如何使用已经编写、测试和调试过的 PHP 模块正确解析 HTML 的示例，请参见http://htmlparsing.com/php 。

score 0 · Accepted Answer

安迪·莱斯特（Andy Lester）以正确的方式为您提供了答案。

这是你的正则表达式：

<a\s.*?href\s*=\s*['|\"](.*?\?.*?)(?=#|\"|')

如此处所示：

http://rubular.com/r/LHi11VMMR9

php - PHP正则表达式：获取所有带问号的网址

3 回答 3

Related

Reference