0

在 java 中,我想从 URL(instagram)读取并保存所有 HTML,但得到错误 429(请求太多)。我认为这是因为我试图阅读比请求限制更多的行。

StringBuilder contentBuilder = new StringBuilder();
try {
    URL url = new URL("https://www.instagram.com/username");
    URLConnection con = url.openConnection();
    InputStream is =con.getInputStream();
    BufferedReader in = new BufferedReader(new InputStreamReader(is));
    String str;
    while ((str = in.readLine()) != null) {
        contentBuilder.append(str);
    }
    in.close();
} catch (IOException e) {
    log.warn("Could not connect", e);
}
String html = contentBuilder.toString();

错误就是这样;

Could not connect
java.io.IOException: Server returned HTTP response code: 429 for URL: https://www.instagram.com/username/

它还表明由于这条线而发生错误

InputStream is =con.getInputStream();

有谁知道我为什么会收到此错误和/或如何解决它?

4

1 回答 1

2

该问题可能是由于连接未关闭/断开而引起的。对于用于自动关闭的输入 try-with-resources,即使在异常或返回时也很有用。您还构建了一个 InputStreamReader,它将使用运行应用程序的机器的默认编码,但您需要 URL 内容的字符集。 readLine返回没有行尾的行(这通常非常有用)。所以加一个。

StringBuilder contentBuilder = new StringBuilder();
try {
    URL url = new URL("https://www.instagram.com/username");
    URLConnection con = url.openConnection();
    try (BufferedReader in = new BufferedReader(
                new InputStreamReader(con.getInputStream(), "UTF-8"))) {
        String line;
        while ((line = in.readLine()) != null) {
            contentBuilder.append(line).append("\r\n");
        }
    } finally {
        con.disconnect();
    } // Closes in.
} catch (IOException e) {
    log.warn("Could not connect", e);
}
String html = contentBuilder.toString();
于 2018-09-28T08:50:00.837 回答