xml - 我如何在 perl 中解析 xml 网页

Question

你好目前我能够解析xml文件，如果它从网页保存在我的文件夹中。

use strict;
use warnings;
use Data::Dumper;
use XML::Simple;

my $parser = new XML::Simple;
my $data = $parser->XMLin("config.xml");
print Dumper($data);

但如果我试图从网站上解析它，它就不起作用。

use strict;
use warnings;
use Data::Dumper;
use XML::Simple;

my $parser = new XML::Simple;
my $data = $parser->XMLin("http://website/computers/computers_main/config.xml");
print Dumper($data);

它给了我以下错误“文件不存在：http://website/computers/computers_main/config.xml at test.pl line 12”

如何解析网页中的多个 xml 文件？我必须从网站中获取多个 xml 并对其进行解析。有人可以帮我吗？

score 3 · Accepted Answer

阅读XML::Simple. 请注意，该XMLin方法可以采用文件句柄、字符串甚至IO::Handle对象。它不能接受的是通过 HTTP 的 URL。

使用 Perl 模块LWP::Simple获取您需要的 XML 文件并将其传递给XMLin.

您必须LWP::Simple通过 using下载和安装cpan，就像之前为XML::Simple.

score 2 · Accepted Answer

超级编辑：此方法将需要 WWW::Mechanize 但它允许您登录到您的网站然后获取 xml 页面。您将不得不更改评论中的一些内容。希望这可以帮助。

use strict;
use warnings;
use Data::Dumper;
use XML::Simple;
use WWW::Mechanize;

# Create a new instance of Mechanize
$bot = WWW::Mechanize->new();
# Create a cookie jar for the login credentials
$bot->cookie_jar(
        HTTP::Cookies->new(
            file           => "cookies.txt",
            autosave       => 1,
            ignore_discard => 1,
    )
);
# Connect to the login page
$response = $bot->get( 'http://www.thePageYouLoginTo.com' );
# Get the login form
$bot->form_number(1);
# Enter the login credentials.
# You're going to have to change the login and 
# pass(on the left) to match with the name of the form you're logging
# into(Found in the source of the website). Then you can put your 
# respective credentials on the right.
$bot->field( login => 'thisIsWhereYourLoginInfoGoes' );
$bot->field( pass => 'thisIsWhereYourPasswordInfoGoes' );
$response =$bot->click();
# Get the xml page
$response = $bot->get( 'http://website/computers/computers_main/config.xml' );
my $content = $response->decoded_content();
my $parser = new XML::Simple;
my $data = $parser->XMLin($content);
print Dumper($data);

试一试。如上所述使用 LWP::Simple。它只是连接到页面并获取该页面的内容（xml 文件）并通过 XMLin 运行。 编辑：在 get $url 行添加了简单的错误检查。 Edit2：将代码保留在这里，因为如果不需要登录它应该可以工作。

use strict;
use warnings;
use Data::Dumper;
use XML::Simple;
use LWP::Simple;

my $parser = new XML::Simple;

my $url = 'http://website/computers/computers_main/config.xml';
my $content = get $url or die "Unable to get $url\n";
my $data = $parser->XMLin($content);

print Dumper($data);

score 1 · Accepted Answer

如果您没有任何特定的理由坚持使用 XML::Simple，那么请使用其他一些解析器，例如 XML::Twig、XML::LibXML，它提供了一个内置功能来解析可通过 Web 获得的 XML。

这是使用 XML::Twig 的简单代码

use strict;
use warnings;
use XML::Twig;
use LWP::Simple;

my $url = 'http://website/computers/computers_main/config.xml';
my $twig= XML::Twig->new();
$twig->parse( LWP::Simple::get( $url ));

如前所述，XML::Simple 没有这样的内置功能。

xml - 我如何在 perl 中解析 xml 网页

3 回答 3

Related

Reference