我必须处理一个用某种 ASP.Net 框架编写的站点,该框架在每个请求中发送大量的 POST 数据(大约 100 Kb 的数据,其中大约 95 个似乎从未在请求之间改变 - 显然与视口状态相关) .
但是,我找不到任何适合我的方法。我已经研究过拦截 XHR,我什至发现有人正在处理相同的框架(至少从选择器来看)但有一个更简单的案例,受这个问题的启发。我发现在过去,这不能用 PhantomJS 完成。
我的问题是单击按钮会启动一系列 AJAX 请求,最终发送这个巨大的 POST 表单,最后服务器用“Content-Disposition:附件”回复。
最后,我发现这种方法对我有用,即使它是网络效率低下的:
...setting up everything, until I just need to click on a button...
phantomData = null;
phantomRequest = null;
// Here, I just recognize the form being submitted and copy it.
casper.on('resource.requested', function(requestData, request) {
for (var h in requestData.headers) {
if (requestData.headers[h].name === 'Content-Type') {
if (requestData.headers[h].value === 'application/x-www-form-urlencoded') {
phantomData = requestData;
phantomRequest = request;
}
}
}
});
// Here, I recognize when the request has FAILED because PhantomJS does
// not support straight downloading.
casper.on('resource.received', function(resource) {
for (var h in resource.headers) {
if (resource.headers[h].name === 'content-disposition') {
if (resource.stage === 'end') {
if (phantomData) {
// to do: get name from resource.headers[h].value
casper.download(
resource.url,
"output.pdf",
phantomData.method,
phantomData.postData
);
} else {
// Something went wrong.
}
// Possibly, remove listeners?
}
}
}
});
// Now, click on the button and initiate the dance.
casper.click(pdfLinkSelector);
下载工作完美无缺,即使我可以看到文件被请求(和发送)两次。
[debug] [phantom] Navigation requested: url=https://somesite/SomePage.aspx, type=FormSubmitted, willNavigate=true, isMainFrame=true
[debug] [application] GOT FORM, REQUEST DATA SAVED
[warning] [phantom] Loading resource failed with status=fail (HTTP 200): https://somesite/SomePage.aspx
[debug] [application] END STAGE REACHED, PHANTOMDATA PRESENT
[debug] [application] ATTEMPTING CASPERJS.DOWNLOAD
[debug] [remote] sendAJAX(): Using HTTP method: 'POST'
[debug] [phantom] Downloaded and saved resource in output.pdf
[debug] [application] TERMINATING SUCCESSFULLY
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] url changed to "about:blank"
(接下来,我可能会修改脚本以尝试request.abort()
从侦听器内部调用resource.requested
,设置信号量并再次调用下载器 - 我将无法获取附件文件名,但这对我来说无关紧要)。