0

我有一个条件,我想从特定标签中检索文本,但它似乎没有返回 true.. 有什么帮助吗?

#!/usr/bin/perl

use HTML::TreeBuilder;
use warnings;
use strict;
my $URL = "http://prospectus.ulster.ac.uk/modules/index/index/selCampus/JN/selProgramme/2132/hModuleCode/COM137";
my $tree = HTML::TreeBuilder->new_from_content($URL);  

if (my $div = $tree->look_down(_tag => "div ", class => "col col60 moduledetail")) {
 printf $div->as_text();
          print "test";
 open (FILE, '>mytest.txt');
 print FILE $div;
 close (FILE); 
}
      print $tree->look_down(_tag => "th", class => "moduleCode")->as_text();
 $tree->delete();

它没有进入 if 语句,并且 if 语句之外的打印说有一个未定义的值,但我知道它应该返回 true,因为这些标签确实存在。

<th class="moduleCode">COM137<small>CRN: 33413</small></th>

谢谢

4

1 回答 1

3

You are calling HTML::TreeBuilder->new_from_content yet you are supplying a URL instead of content. You have to get the HTML before you can pass it to HTML::TreeBuilder.

Perhaps the simplest way is to use LWP::Simple which imports a subroutine called get. This will read the data at the URL and return it as a string.

The reason your conditional block is never executed is that you have a space in the tag name. You need "div" instead of "div ".

Also note the following:

  • You shouldn't output a single string by using printf with that string as a format specifier. It may generate missing argument warnings and fail to output the string properly.

  • You should ideally use lexical file handles and the three-argument form of open. You should also check the status of all open calls and respond accordingly.

  • Your scalar variable $div is a blessed hash reference, so printing it as it is will output something like HTML::Element=HASH(0xfffffff). You need to call its methods to extract the values you want to display

With these errors corrected your code looks like this, although I haven't formatted the output as I can't tell what you want.

use strict;
use warnings;

use HTML::TreeBuilder;
use LWP::Simple;

my $url = "http://prospectus.ulster.ac.uk/modules/index/index/selCampus/JN/selProgramme/2132/hModuleCode/COM137";
my $html = get $url;
my $tree = HTML::TreeBuilder->new_from_content($html);  

if (my $div = $tree->look_down(_tag => "div", class => "col col60 moduledetail")) {
  print $div->as_text(), "\n";
  open my $fh, '>', 'mytest.txt' or die "Unable to open output file: $!";
  print $fh $div->as_text, "\n";
}

print $tree->look_down(_tag => "th", class => "moduleCode")->as_text, "\n";
于 2012-04-06T14:10:05.023 回答