regex - 正则表达式：在 H1 标签中的冒号后匹配？

Question

我希望我问这个没问题，我搜索了 stackoverflow 并找到了类似的问题，但没有解决方案对我有用。

我有这样的 HTML： <h1>Beatles: A Hard Days Night</h1>现在我想要一个正则表达式来匹配冒号之后的所有内容。所以A Hard Days Night在这种情况下。

这是我尝试过的：

$pattern = "/<h1>\:(.*)<\/h1>/";

但这只是输出一个空数组。

score 4 · Accepted Answer

以下正则表达式应匹配：

<h1>[^:]+:\s+([^<]+)

PowerShell 测试：

PS> '<h1>Beatles: A Hard Days Night</h1>' -match '<h1>[^:]+:\s+([^<]+)'; $Matches
True

Name                           Value
----                           -----
1                              A Hard Days Night
0                              <h1>Beatles: A Hard Days Night

一点解释：

<h1>    # match literal <h1>
[^:]+   # match everything *before* the colon (which in this case
        # shouldn't include a colon itself; if it does, then use .*)
:       # Literal colon
\s+     # Arbitrary amount of whitespace
([^<]+) # Put everything up to the next < into a capturing group.

regex - 正则表达式：在 H1 标签中的冒号后匹配？

1 回答 1

Related

Reference