java - 从/向文件读取/写入数组的最快方法？

Question

我知道这里和网上有几个类似的线程，但我想我似乎做错了什么。我的任务很简单 - 将一个大整数数组（int [] 或 ArrayList 或您认为最好的）写入（然后读取）到文件中。越快越好。我的具体数组中有大约 450 万个整数，当前时间例如（以毫秒为单位）：

生成树：14851.13071
生成数组：2237.4661619999997
保存数组：89250.167617
加载数组：114908.08185799999

这是不可接受的，我想时间应该少得多。我究竟做错了什么？我不需要地球上最快的方法，但将这些时间缩短到大约 5 - 15 秒（欢迎减少但不是强制性的）是我的目标。

我当前的代码：

long start = System.nanoTime();

Node trie = dawg.generateTrie("dict.txt");
long afterGeneratingTrie = System.nanoTime();
ArrayList<Integer> array = dawg.generateArray(trie);
long afterGeneratingArray = System.nanoTime();

try
{
    new ObjectOutputStream(new FileOutputStream("test.txt")).writeObject(array);
}
catch (Exception e)
{
    Logger.getLogger(DawgTester.class.getName()).log(Level.SEVERE, null, e);
}
long afterSavingArray = System.nanoTime();

ArrayList<Integer> read = new ArrayList<Integer>();
try
{
    read = (ArrayList)new ObjectInputStream(new FileInputStream("test.txt")).readObject();
}
catch (Exception e)
{
    Logger.getLogger(DawgTester.class.getName()).log(Level.SEVERE, null, e);
}
long afterLoadingArray = System.nanoTime();

System.out.println("Generating trie: " + 0.000001 * (afterGeneratingTrie - start));
System.out.println("Generating array: " + 0.000001 * (afterGeneratingArray - afterGeneratingTrie));
System.out.println("Saving array: " + 0.000001 * (afterSavingArray - afterGeneratingArray));
System.out.println("Loading array: " + 0.000001 * (afterLoadingArray - afterSavingArray));

score 3 · Accepted Answer

不要使用 java 序列化。它非常强大和健壮，但不是特别快速（或紧凑）。使用简单的DataOutputStream和调用writeInt()。（确保使用BufferedOutputStream介于DataOutputStream和之间FileOutputStream）。

如果您想在读取时预先调整数组的大小，请将您的第一个 int 写入数组长度。

score 0 · Accepted Answer

像下面这样的东西可能是一个相当快的选择。如果您担心减少开销，您还int[]应该使用实际的数组。ArrayList<Integer>

final Path path = Paths.get("dict.txt");
...
final int[] rsl = dawg.generateArray(trie);
final ByteBuffer buf = ByteBuffer.allocateDirect(rsl.length << 2);

final IntBuffer buf_i = buf.asIntBuffer().put(rsl).flip();
try (final WritableByteChannel out = Files.newByteChannel(path,
    StandardOpenOptions.WRITE, StandardOpenOptions.TRUNCATE_EXISTING)) {
  do {
    out.write(buf);
  } while (buf.hasRemaining());
}

buf.clear();
try (final ReadableByteChannel in = Files.newByteChannel(path,
    StandardOpenOptions.READ)) {
  do {
    in.read(buf);
  } while (buf.hasRemaining());
}
buf_i.clear();
buf_i.get(rsl);

java - 从/向文件读取/写入数组的最快方法？

2 回答 2

Related

Reference