0

下面的代码显示 TreeBuilder 方法look_down 找不到“section”元素。为什么?

use strict;
use warnings;
use HTML::TreeBuilder;

my $html =<<'END_HTML';
<html>
<head><title></title></head>
<body>
<div attrname="div">
<section attrname="section">
</section>
</div>
</body>
</html>
END_HTML

my $tree = HTML::TreeBuilder->new_from_content($html);

my @divs = $tree->look_down('attrname', 'div');
print "number of div elements found = ", scalar(@divs), "\n";

my @sections = $tree->look_down('attrname', 'section');
print "number of section elements found = ", scalar(@sections), "\n";

$tree->delete();

输出:找到的 div 元素数 = 1 找到的部分元素数 = 0

4

2 回答 2

3
my @divs = $tree->look_down('attrname', 'div');
print "number of div elements found = ", scalar(@divs), "\n";

这找到了一个元素,因为它将属性 与恰好在标签上attrname的值相匹配。div<div>

my @sections = $tree->look_down('attrname', 'section');
print "number of section elements found = ", scalar(@sections), "\n";

这不匹配任何内容,因为没有带有名为attrnamevalue的属性的标签section

他们应该是

my @divs = $tree->look_down(_tag => 'div');
...
my @sections = $tree->look_down(_tag => 'section');

HTML::Element#lookdown文档中,这一切都有些晦涩的解释。没有明确解释什么是“标准”,您必须阅读整个页面才能找到_tag引用标签名称的伪属性......但是仔细阅读整个页面可能会节省您几个小时从长远来看会感到沮丧:-)

于 2019-07-16T23:30:51.530 回答
2

这对我有用:

my $tree = HTML::TreeBuilder->new;
$tree->ignore_unknown(0);  # <-- Include unknown elements in tree
$tree->parse($html);
my @divs = $tree->look_down('attrname', 'div');
my @sections = $tree->look_down('attrname', 'section');
print "number of div elements found = ", scalar(@divs), "\n";
print "number of section elements found = ", scalar(@sections), "\n";

输出

number of div elements found = 1
number of section elements found = 1
于 2019-07-17T00:08:22.463 回答