regex - Perl 正则表达式：在字符串中搜索所有 class="" 并将值保存在数组中

Question

我正在尝试从 HTML 文档中的字符串中获取类。字符串例如：

<span class="bullet first">Some</span>Published <abbr class="published">Sometexthere</abbr></p>

所以，我想要实现的是获取字符串中的所有“类”（项目符号，首先，发布）。但问题是它可以是字符串中任意数量的 class="" 。所以，我想没有办法用一个正则表达式来做到这一点，我需要在这里循环吗？

score 2 · Accepted Answer

不管你怎么做，这都是一个两步的过程：

提取类属性的值（“bullet first”、“published”）。
从这些值（“bullet”、“first”、“published”）中提取类。

XML::LibXML（也是一个 HTML 解析器）：

my @classes =
   map split(' ', $_->getValue()),          # Step 2
      $xpc->findnodes('*/@class', $node);   # Step 1

（或者也许.//*/@class，取决于你想要什么。）

score 0 · Accepted Answer

如果您确定 html 不包含复杂数据，例如<p> class="abc" <\p>使用全局修饰符循环遍历正则表达式将导致它在上次匹配的位置启动它。例子

While ($_=~ /class="(.*?)"/g) {
    #process class names here
    #class is in $1
}

但是，对于一般用途，建议使用 html 解析器，因为这会将字符串处理<p> class="abc" <\p>为包含类 abc

score 0 · Accepted Answer

我添加这个来回答部分'所以，我想没有办法用一个正则表达式来做到这一点，我需要在这里循环吗？

您必须在正则表达式中使用修饰符 g

my $text = '<span class="bullet first">Some</span>Published <abbr class="published">Sometexthere</abbr></p>';
while($text =~ /class\s*=\s*"([^"]+)"/g) {
  print "class --> $1\n";
}

这是结果

class --> bullet first
class --> published

regex - Perl 正则表达式：在字符串中搜索所有 class="" 并将值保存在数组中

3 回答 3

Related

Reference