java - 读取 HTML 时的 Http 响应代码 429

Question

在 java 中，我想从 URL（instagram）读取并保存所有 HTML，但得到错误 429（请求太多）。我认为这是因为我试图阅读比请求限制更多的行。

StringBuilder contentBuilder = new StringBuilder();
try {
    URL url = new URL("https://www.instagram.com/username");
    URLConnection con = url.openConnection();
    InputStream is =con.getInputStream();
    BufferedReader in = new BufferedReader(new InputStreamReader(is));
    String str;
    while ((str = in.readLine()) != null) {
        contentBuilder.append(str);
    }
    in.close();
} catch (IOException e) {
    log.warn("Could not connect", e);
}
String html = contentBuilder.toString();

错误就是这样；

Could not connect
java.io.IOException: Server returned HTTP response code: 429 for URL: https://www.instagram.com/username/

它还表明由于这条线而发生错误

InputStream is =con.getInputStream();

有谁知道我为什么会收到此错误和/或如何解决它？

score 2 · Accepted Answer

该问题可能是由于连接未关闭/断开而引起的。对于用于自动关闭的输入 try-with-resources，即使在异常或返回时也很有用。您还构建了一个 InputStreamReader，它将使用运行应用程序的机器的默认编码，但您需要 URL 内容的字符集。 readLine返回没有行尾的行（这通常非常有用）。所以加一个。

StringBuilder contentBuilder = new StringBuilder();
try {
    URL url = new URL("https://www.instagram.com/username");
    URLConnection con = url.openConnection();
    try (BufferedReader in = new BufferedReader(
                new InputStreamReader(con.getInputStream(), "UTF-8"))) {
        String line;
        while ((line = in.readLine()) != null) {
            contentBuilder.append(line).append("\r\n");
        }
    } finally {
        con.disconnect();
    } // Closes in.
} catch (IOException e) {
    log.warn("Could not connect", e);
}
String html = contentBuilder.toString();

java - 读取 HTML 时的 Http 响应代码 429

1 回答 1

Related

Reference