我正在解析一个包含客户买家反馈的网站。我想收集每个买家的姓名和他或她给出的反馈。
我的问题是第一页上只给出了一些反馈。通过单击按钮触发下一页,网站以 AJAX 响应。如何从 AJAX 响应中获取新的反馈到我的 Mechanize 页面对象中?我想尽可能多地点击 AJAX 触发按钮,所以我会得到尽可能多的反馈。
我的代码如下所示:
require 'mechanize'
require 'nokogiri'
log_file = "log_file.txt"
log = File.open(log_file, 'w')
www = "http://www.trustpilot.dk/review/www.fona.dk"
agent = Mechanize.new
page = agent.get(www)
reviews = page.search(".clear")
reviews.each do |r|
doc = Nokogiri::HTML::Document.parse(r.to_html)
log << "####################### NEW REVIEW #######################\n\n"
name = r.at_css(".profileinfo a").text.strip
log << "Customer name: #{name}\n"
rating = doc.at("//meta[@itemprop = 'ratingValue']/@content").to_s
log << "Rating: #{rating}\n\n"
end
log.close
日志文件 fyi 将如下所示:
####################### NEW REVIEW #######################
Customer name: Hans-Oluf
Rating: 5
####################### NEW REVIEW #######################
Customer name: Jørgen
Rating: 3
####################### NEW REVIEW #######################
Customer name: Frederik
Rating: 4
AJAX 触发器应该在以下源代码中:
<div id="AjaxLoader_1" class="AjaxPager">
<div class="AjaxPagerLinkWrapper">
<a class="button AjaxPagerLink" href="http://www.trustpilot.dk/review/www.fona.dk?page=2">
Vis flere anmeldelser
</a>
</div>
</div>
<script type="text/javascript">
$(document).ready (function() {
// Testing spilttest console.log("/domains/reviews?DID=767");
// Get element right before this control
var containerId = 'reviewContainer';
var container = containerId == ''
? $('#AjaxLoader_1').prev()
: $($.f('#{0}', containerId));
var pager = new Pager(
1,
25,
'nextPageLoaded',
'AjaxLoader_1',
'/domains/reviews?DID=767',
'page',
'',
container);
});
</script>