java - 我无法从网站获取所有字节

Question

我正在尝试从网站读取所有字节，但我认为我没有得到所有字节。我给字节数组长度一个很高的值。我使用了这种方法，但它总是返回异常。

这是代码：

DataInputStream dis = new DataInputStream(s2.getInputStream());

byte[] bytes = new byte[900000];

// Read in the bytes
int offset = 0;
int numRead = 0;
while (offset < bytes.length
    && (numRead=dis.read(bytes, offset, bytes.length-offset)) >= 0) {
        offset += numRead;
}

// Ensure all the bytes have been read in
if (offset < bytes.length) {
    throw new IOException("Could not completely read website");
}
out.write(bytes);

编辑版本：

ByteArrayOutputStream bais = new ByteArrayOutputStream();
InputStream is = null;
try {
    is = s2.getInputStream();
    byte[] byteChunk = new byte[4096]; // Or whatever size you want to read in at a time.
    int n;
    while ( (n = is.read(byteChunk)) > 0 ) {
        bais.write(byteChunk, 0, n);
    }
}
catch (IOException e) {
    System.err.printf ("Failed while reading bytes");
    e.printStackTrace ();
    // Perform any other exception handling that's appropriate.
}
finally {
    if (is != null) { is.close(); }
}
byte[] asd = bais.toByteArray();
out.write(asd);

score 3 · Accepted Answer

这就是问题：

if (offset < bytes.length)

只有当原始数据超过 900,000 字节时才会触发。如果响应在不到那个时间内完全完成，read()将正确返回 -1 以指示流的结束。

offset 如果等于，您实际上应该抛出异常bytes.length，因为这表明您可能已经截断了数据:)

不清楚你从哪里得到 900,000 的价值，请注意......

我建议如果你想坚持使用原始流，你可以使用Guava的ByteStreams.toByteArray方法来读取所有数据。或者，您可以继续循环，读取较小ByteArrayOutputStream的缓冲区，并在每次迭代时写入 a 。

score 1 · Accepted Answer

我意识到这并不能回答您的具体问题。但是，当诸如HttpClient之类的库存在并且被调试/分析等时，我真的不会手动编码这种事情。

例如这里是如何使用流畅的界面

Request.Get("http://targethost/homepage").execute().returnContent();

如果您正在处理抓取和抓取 HTML，JSoup是一种替代方案。

java - 我无法从网站获取所有字节

2 回答 2

Related

Reference