html - 使用正则表达式 <(?.+)>(?(应答器).itemprop=.)

Question

您好，我有一些 html，我尝试使用正则表达式来查找所有获得 itemprop 属性的 HTML 标记。

我想使用正则表达式，因为我不确定 HTML 的完整性。

我试过这个正则表达式。

<(?<balise>.+)>(?(balise).*itemprop=.*)

我想在我的组中匹配一个模式，但它不起作用。

有人可以帮助我

文本解析示例

<meta itemprop="currency" content="CDN" >
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="fr" xmlns:og="http://ogp.me/ns#" xmlns:fb="http://www.facebook.com/2008/fbml">
<head><span itemprop="name">My name</span>

我只需要提取获取 itemprop 的 html 标签

score 1 · Accepted Answer

正如评论中所指出的，使用正则表达式解析 html 远非理想，您应该考虑使用适当的框架。

但是，如果坚持使用正则表达式，您可以尝试以下方式：

(?<=<)\s*([^\s>]+?)(?=\s[^>]*(?<=\s)itemprop="[^<]*?/?>)

查找由组成的字符串zero or more whitespace characters followed by one or more non whitespace and non > characters，所述字符串必须以 a<开头，后跟（按给定顺序）：a whitespace, zero or more non > characters, itemprop=" preceded by a space, zero or more non < characters, optional /, >.

上面的正则表达式不能确保标签的属性关于'and正确平衡，"如果它包含非法字符/语法也不会失败。它只会断言标签有一个开始，一个包含多个空格的名称，一个 itemprop 属性并且标签是关闭的。

html - 使用正则表达式 <(?.+)>(?(应答器).*itemprop=.*)

1 回答 1

Related

Reference

html - 使用正则表达式 <(?.+)>(?(应答器).itemprop=.)