java - 使用 HTTP GET 下载文件，在 java 中传递 cookie

Question

我想从 Java 中的 URL 解析 HTML 文档。

当我在浏览器（chrome）中输入 url 时，它不会显示 html 页面，但会下载它。

所以网址是网页上“下载”按钮后面的链接。到目前为止没有问题。该网址是“ https://www.shazam.com/myshazam/download-history ”，如果我将其粘贴到浏览器中，它可以正常下载。但是当我尝试用java下载它时，我得到一个401（禁止）错误。

我在加载 url 时检查了 chrome 网络工具，并注意到我的配置文件数据和注册 cookie 是通过 http GET 传递的。

我尝试了很多不同的方法，但都没有奏效。所以我的问题是，我如何用 java 生成它？如何获取（下载）HTML 文件并对其进行解析？

更新：

这是我们目前发现的（感谢 Andrew Regan）：

BasicCookieStore store = new BasicCookieStore();
store.addCookie( new BasicClientCookie("profile-data", "value") );  // profile-data
store.addCookie( new BasicClientCookie("registration", "value") );  // registration
Executor executor = Executor.newInstance();
String output = executor.use(store)
            .execute(Request.Get("https://www.shazam.com/myshazam/download-history"))
            .returnContent().asString();

最后一行代码似乎导致了 NullPointerException。其余代码似乎可以正常加载不受保护的网页。

score 3 · Accepted Answer

我自己找到了答案。使用 HttpURLConnection，此方法可用于对各种服务进行“身份验证”。我使用 chrome 的内置网络工具来获取 GET 请求的 cookie 值。

HttpURLConnection con = (HttpURLConnection) new URL("https://www.shazam.com/myshazam/download-history").openConnection();
con.setRequestMethod("GET");
con.addRequestProperty("Cookie","registration=Cooki_Value_Here;profile-data=Cookie_Value_Here");
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
String inputLine;
    while ((inputLine = in.readLine()) != null) 
    System.out.println(inputLine);
    in.close();

score 0 · Accepted Answer

因此，如果您删除这些 cookie/使用私人会话，浏览器应该会重现您在代码中看到的内容。

我猜您需要先访问“ http://www.shazam.com/myshazam ”并登录。

score 0 · Accepted Answer

您可以尝试使用例如HttpClient Fluent API将 cookie 值添加到 GET 请求中：

CookieStore store = new BasicCookieStore();
store.addCookie( new BasicClientCookie(name, value) );  // profile-data
store.addCookie( new BasicClientCookie(name, value) );  // registration

Executor executor = Executor.newInstance();
String output = executor.cookieStore(store)
        .execute(Request.Get("https://www.shazam.com/myshazam/download-history"))
        .returnContent().asString();

解析你可以这样做：

Element dom = Jsoup.parse(output);
for (Element element : result.select("tr td")) {
    String eachCellValue = element.text();
    // Whatever
}

（你没有提供比这更详细的信息）

java - 使用 HTTP GET 下载文件，在 java 中传递 cookie

3 回答 3

Related

Reference