html - 使用 Perl 从 url 中提取 HTML

Question

我想提取 TWiki 的 HTML 代码（我有谁的 URL）。最好的方法是什么？

此外，一旦我提取了 HTML 代码，我需要在托管在 Google 站点上的站点中将其导出。那可能吗？

score 2 · Accepted Answer

获取 HTML 页面的一种非常简单的方法是LWP::Simple模块。如果您必须执行更复杂的导航流程，请使用WWW::Mechanize。然后，如果您需要解析 HTML 代码，@brian 解决方案很好。

score 1 · Accepted Answer

听起来您需要 CPAN HTML::Parser模块。

use HTML::Parser ();

 # Create parser object
 $p = HTML::Parser->new( api_version => 3,
                         start_h => [\&start, "tagname, attr"],
                         end_h   => [\&end,   "tagname"],
                         marked_sections => 1,
                       );
# Parse directly from file
 $p->parse_file("foo.html");

html - 使用 Perl 从 url 中提取 HTML

2 回答 2

Related

Reference