举个例子:
我从 .txt 加载输入:
本杰明,Schuvlein,德国,1912,M,White
我做了一些代码,为了简洁起见,我不会在这里发布并访问链接:
https://familysearch.org/pal:/MM9.1.1/K3BN-LLJ
- 我想从该页面上抓取多个内容。在下面的代码中,我只做 1。
- 我还想让每个项目在输出 .txt 中用 , 分隔。
- 而且,我希望输出之前是输入。
我在代码中使用了以下包:
use strict;
use warnings;
use WWW::Mechanize::Firefox;
use Data::Dumper;
use LWP::UserAgent;
use JSON;
use CGI qw/escape/;
use HTML::DOM;
以下是相关代码:
my $ua = LWP::UserAgent->new;
open(my $o, '>', 'out2.txt') or die "Can't open output file: $!";
# Here is the url, although in practice, it is scraped itself using different code
my $url = 'https://familysearch.org/pal:/MM9.1.1/K3BN-LLJ';
print "My URL is <$url>\n";
my $request = HTTP::Request->new(GET => $url);
$request->push_header('Content-Type' => 'application/json');
my $response = $ua->request($request);
die "Error ".$response->code if !$response->is_success;
my $dom_tree = new HTML::DOM;
$dom_tree->write($response->content);
$dom_tree->close;
my $str = $dom_tree->getElementsByTagName('table')->[0]->getElementsByTagName("td")->[10]->as_text();
print $str;
print $o $str;
所需的输出(来自该链接)类似于:
Benjamin,Schuvlein,德国,1912,M,White,Queens,New York,Married,Same Place,Head,等等......
(该输出部分中有多少是可抓取的?)
任何有关如何在链接中获取链接的帮助将不胜感激!