regex - 如何获取字符串中正则表达式的所有匹配项？

Question

如何使用curl来获取任何 html 标签的内容？在以下脚本中获取例如h1内容：

#!/usr/bin/perl  

use strict;  
use warnings;  

my $page = `curl www.yahoo.com`;  
print "Page: \n";  
sleep(5);  
#print "$page \n";  
if ($page =~ m/<h1\s*>(.*)<\/h1\s*>/ig){  
        print "$1 \n";  
}

我只有一场比赛。我怎样才能得到所有的比赛？

score 2 · Accepted Answer

您可以像这样获得所有匹配项：

my @matches = $page =~ /<h1\b[^>]*>(.*?)<\/h1>/ig;

print "@matches\n";

（但请注意，在 yahoo.com 上，只有一个 h1 标签）

score 2 · Accepted Answer

用正则表达式解析 HTML 是一种罪过。幸运的是，周围有很多解析器。我特别喜欢 Mojo 套件：

use strict; use warnings;
use feature 'say';
use Mojo;

my $ua  = Mojo::UserAgent->new(max_redirects => 5);  # redirects defaults to zero
for my $h3 ($ua->get('www.stackoverflow.com')->res->dom('h3')->each) { # use CSS selectors
  say $h3->all_text;
}

score 1 · Accepted Answer

使用while循环而不是if：

while ($page =~ m/<h1\s*>(.*)<\/h1\s*>/ig) {  
    print "$1 \n";  
}

regex - 如何获取字符串中正则表达式的所有匹配项？

3 回答 3

Related

Reference