java - Java - 字符缓冲区问题

Question

我有一个 1.99 GB 的字符文件。现在，我想从该文件中随机提取数百万个子序列，例如从位置 90 到 190、10 到 110、50000 到 50100 等（每个 100 个字符长）。

我通常使用它，

    FileChannel channel = new RandomAccessFile(file , "r").getChannel();
    ByteBuffer buffer = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
    Charset chars = Charset.forName("ISO-8859-1");
    CharBuffer cbuf = chars.decode(buffer);
    String sub = cbuf.subSequence(0, 100).toString();

    System.out.println(sub);

但是，对于 1.99 gb 文件，上面的代码给出了错误，

java.lang.IllegalArgumentException
        at java.nio.CharBuffer.allocate(CharBuffer.java:328)
        at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:792)
        at java.nio.charset.Charset.decode(Charset.java:791)

所以，我使用了以下代码，

FileChannel channel = new RandomAccessFile(file , "r").getChannel();
CharBuffer cbuf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size()).asCharBuffer() ;
String sub = cbuf.subSequence(0, 100).toString();

System.out.println(sub);

它没有给出上述错误但返回输出：

ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹ä¹

应该是“011111000000........”

任何人都可以帮助我为什么会发生这种情况以及如何解决它？

score 2 · Accepted Answer

我只是在猜测，但我认为Charset.decode(ByteBuffer)当它试图在幕后为你分配巨额资金时会失败。 CharBuffer同样，这只是一种预感，但该decode方法仅将缓冲区当前位置的字节解码到其限制，因此您可以执行类似的操作。

ByteBuffer buffer = ...
Charset charset = ...

buffer.position(0);
buffer.limit(100);

System.out.println(charset.decode(buffer));

CharBuffer该方法返回的容量（以字符为单位）decode将为 100。

_{（附带说明一下，我认为您的第二次尝试给出了错误的输出，因为您没有使用特定的字符集来解码您的CharBuffer.）}

java - Java - 字符缓冲区问题

1 回答 1

Related

Reference