51

我正在尝试使用 jsoup 登录一个站点然后抓取信息,我遇到了一个问题,我可以成功登录并从 index.php 创建一个文档,但我无法获取站点上的其他页面。我知道我需要在发布后设置一个 cookie,然后在我试图打开网站上的另一个页面时加载它。但是我该怎么做呢?以下代码让我登录并获取 index.php

Document doc = Jsoup.connect("http://www.example.com/login.php")
               .data("username", "myUsername", 
                     "password", "myPassword")
               .post();

我知道我可以使用 apache httpclient 来做到这一点,但我不想这样做。

4

6 回答 6

111

当您登录该站点时,它可能正在设置一个授权会话 cookie,该 cookie 需要在后续请求中发送以维护会话。

你可以像这样得到cookie:

Connection.Response res = Jsoup.connect("http://www.example.com/login.php")
    .data("username", "myUsername", "password", "myPassword")
    .method(Method.POST)
    .execute();

Document doc = res.parse();
String sessionId = res.cookie("SESSIONID"); // you will need to check what the right cookie name is

然后在下一个请求中发送它,例如:

Document doc2 = Jsoup.connect("http://www.example.com/otherPage")
    .cookie("SESSIONID", sessionId)
    .get();
于 2011-06-25T09:16:18.840 回答
19
//This will get you the response.
Response res = Jsoup
    .connect("loginPageUrl")
    .data("loginField", "login@login.com", "passField", "pass1234")
    .method(Method.POST)
    .execute();

//This will get you cookies
Map<String, String> loginCookies = res.cookies();

//And this is the easiest way I've found to remain in session
Document doc = Jsoup.connect("urlYouNeedToBeLoggedInToAccess")
      .cookies(loginCookies)
      .get();
于 2012-05-10T11:53:21.990 回答
1

代码在哪里:

Document doc = Jsoup.connect("urlYouNeedToBeLoggedInToAccess").cookies().get(); 

在将其更改为之前,我遇到了困难:

Document doc = Jsoup.connect("urlYouNeedToBeLoggedInToAccess").cookies(cookies).get();

现在它工作完美。

于 2012-12-29T01:14:31.580 回答
0

这是您可以尝试的...

import org.jsoup.Connection;


Connection.Response res = null;
    try {
        res = Jsoup
                .connect("http://www.example.com/login.php")
                .data("username", "your login id", "password", "your password")
                .method(Connection.Method.POST)
                .execute();
    } catch (IOException e) {
        e.printStackTrace();
    }

现在保存所有 cookie 并向您想要的其他页面发出请求。

//Store Cookies
cookies = res.cookies();

向另一个页面发出请求。

try {
    Document doc = Jsoup.connect("your-second-page-link").cookies(cookies).get();
}
catch(Exception e){
    e.printStackTrace();
}

询问是否需要进一步的帮助。

于 2017-12-05T08:11:45.047 回答
0
Connection.Response res = Jsoup.connect("http://www.example.com/login.php")
    .data("username", "myUsername")
    .data("password", "myPassword")
    .method(Connection.Method.POST)
    .execute();
//Connecting to the server with login details
Document doc = res.parse();
//This will give the redirected file
Map<String,String> cooki=res.cookies();
//This gives the cookies stored into cooki
Document docs= Jsoup.connect("http://www.example.com/otherPage")
    .cookies(cooki)
    .get();
//This gives the data of the required website
于 2020-06-12T14:02:50.610 回答
0

为什么要重新连接?如果有任何 cookie 可以避免 403 状态,我会这样做。

                Document doc = null;
                int statusCode = -1;
                String statusMessage = null;
                String strHTML = null;
        
                try {
    // connect one time.                
                    Connection con = Jsoup.connect(urlString);
    // get response.
                    Connection.Response res = con.execute();        
    // get cookies
                    Map<String, String> loginCookies = res.cookies();

    // print cookie content and status message
                    if (loginCookies != null) {
                        for (Map.Entry<String, String> entry : loginCookies.entrySet()) {
                            System.out.println(entry.getKey() + ":" + entry.getValue().toString() + "\n");
                        }
                    }
        
                    statusCode = res.statusCode();
                    statusMessage = res.statusMessage();
                    System.out.print("Status CODE\n" + statusCode + "\n\n");
                    System.out.print("Status Message\n" + statusMessage + "\n\n");
        
    // set login cookies to connection here
                    con.cookies(loginCookies).userAgent("Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0");
        
    // now do whatever you want, get document for example
                    doc = con.get();
    // get HTML
                    strHTML = doc.head().html();

                } catch (org.jsoup.HttpStatusException hse) {
                    hse.printStackTrace();
                } catch (IOException ioe) {
                    ioe.printStackTrace();
                }
于 2021-08-06T03:40:06.360 回答