40

I'm trying to use NodeJS to scrape a website that requires a login by POST. Then once I'm logged in I can access a separate webpage by GET.

The first problem right now is logging in. I've tried to use request to POST the login information, but the response I get does not appear to be logged in.

exports.getstats = function (req, res) {
    request.post({url : requesturl, form: lform}, function(err, response, body) {
        res.writeHeader(200, {"Content-Type": "text/html"});
        res.write(body);
        res.end();
    });
};

Here I'm just forwarding the page I get back, but the page I get back still shows the login form, and if I try to access another page it says I'm not logged in.

I think I need to maintain the client side session and cookie data, but I can find no resources to help me understand how to do that.


As a followup I ended up using zombiejs to get the functionality I needed

4

3 回答 3

47

您需要制作一个 cookie jar 并为所有相关请求使用相同的 jar。

 var cookieJar = request.jar();
 request.post({url : requesturl, jar: cookieJar, form: lform}, ...

理论上,这应该允许您以登录用户的身份使用 GET 抓取页面,但只有在您获得实际登录代码后才能正常工作。根据您对登录 POST 响应的描述,这实际上可能还不能正常工作,因此 cookie jar 在您首先解决登录代码中的问题之前将无济于事。

于 2013-11-12T18:31:16.253 回答
15

request.jar();我不起作用。所以我正在使用标头响应来发出另一个请求,如下所示:

request.post({
    url: 'https://exampleurl.com/login',
    form: {"login":"xxxx", "password":"xxxx"}
}, function(error, response, body){

    request.get({
        url:"https://exampleurl.com/logged",
        header: response.headers
    },function(error, response, body){
        // The full html of the authenticated page
        console.log(body);
    });
});

实际上这种方式工作正常。=D

于 2014-08-21T22:45:37.847 回答
1

如果启用,请求会管理请求之间的 cookie:

默认情况下禁用 Cookie(否则,它们将在后续请求中使用)。要启用 cookie,请将 jar 设置为 true(在默认值或选项中)。

const request = request.defaults({jar: true})
request('http://www.google.com', function () {
  request('http://images.google.com')
});
于 2020-04-09T21:59:39.750 回答