1

我正在使用 HTML::TreeBuilder 解析房地产网页,并具有以下代码:

$values{"Pcity"} = $address->look_down("_tag" => "span", 
                   "itemprop" => "addressLocality")->as_text;
$values{"PState"} = $address->look_down("_tag" => "span", 
                   "itemprop" => "addressRegion")->as_text;

某些页面不包含城市或州,解析器退出并出现错误:

Can't call method "as_text" on an undefined value

为了修复它,我使用了以下方法:

$values{"Pcity"} = $address->look_down("_tag" => "span", 
                   "itemprop" => "addressLocality");
if(defined($values{"Pcity"}))
{
    $values{"Pcity"} = $values{"Pcity"}->as_text;
}
else
{
    $values{"Pcity"} = '';
}

它可以工作,但现在我有 9 行而不是 1 行。由于我有很多这样的地方,代码会变得相当大。

有什么办法可以优化吗?

4

2 回答 2

2

假设$address从不包含多个属性<span>的任一给定值itemprop,您可以这样写

for my $span ( $address->look_down(_tag => 'span') ) {
   my $itemprop    = $span->attr('itemprop');
   $values{Pcity}  = $span->as_text if $itemprop eq 'addressLocality';
   $values{PState} = $span->as_text if $itemprop eq 'addressRegion';
}

但是通过使用 访问 HTML 树变得更加简单HTML::TreeBuilder::XPath,它允许使用 XPath 表达式而不是笨拙的look_down. 使用它的解决方案看起来像这样,附带条件是为不存在的节点findvalue返回一个空字符串'',而不是undef; 但这对你来说应该是可行的,因为它仍然评估为false

use strict;
use warnings;

use HTML::TreeBuilder::XPath;

my $xp = HTML::TreeBuilder::XPath->new_from_file(*DATA);

my %values;

$values{Pcity}  = $xp->findvalue('//span[@itemprop="addressLocality"]');
$values{PState} = $xp->findvalue('//span[@itemprop="addressRegion"]');

use Data::Dump;
dd \%values;

__DATA__
<html>
<head>
  <title>Title</title>
</head>
<body>
  <span itemprop="addressLocality">My Locality</span>
  <span itemprop="addressRegion">My Region</span>
</body>
</html>

输出

{ Pcity => "My Locality", PState => "My Region" }
于 2014-09-06T22:51:00.783 回答
1

这更短:

$a = $address->look_down("_tag" => "span", "itemprop" => "addressLocality");
$values{"Pcity"} = $a ? $a->as_text : '';
于 2014-09-06T21:04:38.330 回答