java - 从二进制文件中读取大量 int 的最快方法

Question

我在嵌入式 Linux 设备上使用 Java 1.5，并希望读取一个具有 2MB int 值的二进制文件。（现在是 4bytes Big Endian，但我可以决定，格式）

使用DataInputStreamvia BufferedInputStream using dis.readInt())，这 500 000 次调用需要 17 秒才能读取，但文件读入一个大字节缓冲区需要 5 秒。

我怎样才能将该文件更快地读入一个巨大的 int[]？

阅读过程不应额外使用超过 512 kb。

下面使用的这段代码nio并不比 java io 中的 readInt() 方法快。

    // asume I already know that there are now 500 000 int to read:
    int numInts = 500000;
    // here I want the result into
    int[] result = new int[numInts];
    int cnt = 0;

    RandomAccessFile aFile = new RandomAccessFile("filename", "r");
    FileChannel inChannel = aFile.getChannel();

    ByteBuffer buf = ByteBuffer.allocate(512 * 1024);

    int bytesRead = inChannel.read(buf); //read into buffer.

    while (bytesRead != -1) {

      buf.flip();  //make buffer ready for get()

      while(buf.hasRemaining() && cnt < numInts){
       // probably slow here since called 500 000 times
          result[cnt] = buf.getInt();
          cnt++;
      }

      buf.clear(); //make buffer ready for writing
      bytesRead = inChannel.read(buf);
    }


    aFile.close();
    inChannel.close();

更新：对答案的评估：

在 PC 上，使用 IntBuffer 方法的内存映射是我设置中最快的。
在没有 jit 的嵌入式设备上，java.io DataiInputStream.readInt() 更快一些（17 秒，而带有 IntBuffer 的 MemMap 则为 20 秒）

最终结论：通过算法更改更容易实现显着加速。（初始化文件较小）

score 4 · Accepted Answer

我不知道这是否会比 Alexander 提供的更快，但您可以尝试映射文件。

    try (FileInputStream stream = new FileInputStream(filename)) {
        FileChannel inChannel = stream.getChannel();

        ByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
        int[] result = new int[500000];

        buffer.order( ByteOrder.BIG_ENDIAN );
        IntBuffer intBuffer = buffer.asIntBuffer( );
        intBuffer.get(result);
    }

score 3 · Accepted Answer

您可以IntBuffer从 nio 包中使用 -> http://docs.oracle.com/javase/6/docs/api/java/nio/IntBuffer.html

int[] intArray = new int[ 5000000 ];

IntBuffer intBuffer = IntBuffer.wrap( intArray );

...

通过调用来填充缓冲区inChannel.read(intBuffer)。

一旦缓冲区已满，您intArray将包含 500000 个整数。

编辑

在意识到 Channels 只支持ByteBuffer.

// asume I already know that there are now 500 000 int to read:
int numInts = 500000;
// here I want the result into
int[] result = new int[numInts];

// 4 bytes per int, direct buffer
ByteBuffer buf = ByteBuffer.allocateDirect( numInts * 4 );

// BIG_ENDIAN byte order
buf.order( ByteOrder.BIG_ENDIAN );

// Fill in the buffer
while ( buf.hasRemaining( ) )
{
   // Per EJP's suggestion check EOF condition
   if( inChannel.read( buf ) == -1 )
   {
       // Hit EOF
       throw new EOFException( );
   }
}

buf.flip( );

// Create IntBuffer view
IntBuffer intBuffer = buf.asIntBuffer( );

// result will now contain all ints read from file
intBuffer.get( result );

score 2 · Accepted Answer

我使用序列化/反序列化、DataInputStream 与 ObjectInputStream 进行了相当仔细的实验，两者都基于 ByteArrayInputStream 以避免 IO 影响。对于一百万个整数，readObject 大约是 20 毫秒，readInt 大约是 116。一百万个整数数组的序列化开销是 27 个字节。这是在 2013 年的 MacBook Pro 上。

话虽如此，对象序列化有点邪恶，你必须用 Java 程序写出数据。

java - 从二进制文件中读取大量 int 的最快方法

3 回答 3

Related

Reference