php - php正则表达式中不能使用OR(|)

Question

我是这里的新手。我在 PHP 中使用正则表达式时遇到了一个奇怪的问题。

$result = "some very long long string with different kind of links";

$regex='/<.*?href.*?="(.*?net.*?)"/'; //this is the regex rule

preg_match_all($regex,$result,$parts);

在这段代码中，我试图从结果字符串中获取链接。但它只会为我提供那些包含 .net 的链接。但我也想获得那些有 .com 的链接。为此，我尝试了此代码

    $regex='/<.*?href.*?="(.*?net|com.*?)"/';

但它什么也没显示。

对不起，我的英语不好。

提前致谢。

更新 1：

现在我正在使用这个

$regex='/<.*?href.*?="(.*?)"/';

此规则从字符串中获取所有链接。但这并不完美。因为它还抓取其他子字符串，如“javascript”。

score 3 · Accepted Answer

该|字符适用于捕获组中的所有内容，因此(.*?net|com.*?)将匹配.*?netor com.*?，我认为您想要的是(.*?(net|com).*?)。

如果您不想要额外的捕获组，您可以使用(.*?(?:net|com).*?).

您也可以使用(.*?net.*?|.*?com.*?)，但不建议这样做，因为不必要的重复。

score 1 · Accepted Answer

1

您的正则表达式被解释为.*?netor com.*?。你会想要(.*?(net|com).*?)的。

于 2013-04-30T17:01:12.313 回答

score 1 · Accepted Answer

尝试这个：

$regex='/<.*?href.*?="(.*?\.(?:net|com)\b.*?)"/i';

或更好：

$regex='/<a .*?href\s*+=\s*+"\K.*?\.(?:net|com)\b[^"]*+/i';

score 0 · Accepted Answer

<.*?href

is a problem. This will match from the first < on the current line to the first href, regardless of whether they belong to the same tag.

Generally, it's unwise to try and parse HTML with regexes; if you absolutely insist on doing that, at least be a bit more specific (but still not perfect):

$regex='/<[^<>]*href[^<>=]*="(?:[^"]*(net|com)[^"]*)"/';

php - php正则表达式中不能使用OR(|)

提前致谢。

4 回答 4

Related

Reference