jquery - 从手动 jQuery 代码创建蜘蛛的最简单方法是什么？

Question

我有一个最好被描述为 jQuery“脚本”的东西：

//  Get the Textbooks URL
window.location = $($("li.dropDown").find("a")[0]).attr('href');

//  Fill in Department Data
var depts = $($(".deptSelectInput")[0]).next().children();
$($(".deptSelectInput")[0]).val($($(".deptSelectInput")[0]).next().children().text());
$($(".deptSelectInput")[0]).blur();

//  Fill in Course Data
var courses = $($(".courseSelectInput")[0]).next().children();
$($(".courseSelectInput")[0]).val($($(".courseSelectInput")[0]).next().children().text());
$($(".courseSelectInput")[0]).blur();

//  Fill in Section Data
var sections = $($(".sectionSelectInput")[0]).next().children();
$($(".sectionSelectInput")[0]).val($($(".sectionSelectInput")[0]).next().children().text())
$($(".sectionSelectInput")[0]).blur();


//  Submit the form, only if it's valid
if (($(".noTextBookCourseErrorMessage")[0].style.display) == "none") {
    formSubmission();
}


//  Extract all the ISBNs from the page
var regex = /\d+/g;
var isbn = $('li:contains("ISBN")').text().trim();
var isbns = [];

var tempIsbn = regex.exec(isbn);
while (tempIsbn) {
    isbns.push(parseInt(tempIsbn[0], 10));
    tempIsbn = regex.exec(isbn);
}

console.log(isbns);

它正是我需要做的。

当我在 Chrome 中打开开发工具并分别发布此脚本三次（一次加载新 URL，一次获取数据并提交表单，一次从新页面读取）时，它准确地返回给我我想要的数据.

我对蜘蛛很陌生，想知道自动化该过程的最佳方法是什么。基本上，我需要一个可以运行的脚本来完成我刚才所做的事情（分解三个 jQuery 帖子）。

我研究过 CasperJS 和机械化，但从未使用过。

有什么建议吗？

score 1 · Accepted Answer

在您的情况下，您尝试抓取的 Web 上下文包括通过 JQuery 的动态内容，如果您想使用 Javascript 来实现这一目标，CasperJS 是一个很好的选择。您可以使用它来触发事件、添加流程步骤、包含在每次 ajax 调用之后等待和验证的函数，然后再处理任何下一步。

这是一个示例如何使用 CasperJS 和 JQuery
CasperJs 和使用链式选择的 Jquery爬取网站

要从 CasperJS 执行 javascript 代码，您必须使用evaluate()方法

evaluate() 方法作为 CasperJS 环境和您打开的页面之一之间的门；每次将闭包传递给evaluate() 时，您都在进入页面并执行代码，就好像您在使用浏览器控制台一样。

jquery - 从手动 jQuery 代码创建蜘蛛的最简单方法是什么？

1 回答 1

Related

Reference