java - 在数组中读取文件数据的最快方法（Java）

Question

看看下面的链接：

http://snippetsofjosh.wordpress.com/tag/advantages-and-disadvantages-of-arraylist/

这就是为什么我总是更喜欢使用数组而不是 (Array)Lists 的原因之一。尽管如此，这让我开始思考内存管理和速度。

因此，我得出了以下问题：

当您不知道文件的大小（/条目数）时，从文件中存储数据的最佳方法是什么（其中最佳定义为“最少的计算时间”）

下面，我将介绍 3 种不同的方法，我想知道其中哪一种是最好的以及为什么。为了清楚问题，假设我必须以一个数组结尾。另外，让我们假设 .txt 文件中的每一行只有一个条目（/一个字符串）。另外，为了限制问题的范围，我将把这个问题限制在 Java 上。

假设我们要从名为的文件中检索以下信息words.txt：

Hello
I 
am
a
test 
file

方法1 - 双重和危险

File read = new File("words.txt");
Scanner in = new Scanner(read);

int counter = 0;

while (in.hasNextLine())
{
    in.nextLine();
    counter++;
}

String[] data = new String[counter];

in = new Scanner(read);

int i = 0;

while (in.hasNextLine())
{
    data[i] = in.nextLine();
    i++;
}

方法 2 - 清晰但多余

File read = new File("words.txt");
Scanner in = new Scanner(read);

ArrayList<String> temporary = new ArrayList<String>();

while (in.hasNextLine())
{
    temporary.add(in.nextLine());
}

String[] data = new String[temporary.size()];

for (int i = 0; i < temporary.size(); i++)
{
    data[i] = temporary.get(i);
}

方法 3 - 短而僵硬

File read = new File("words.txt");
FileReader reader = new FileReader(read);

String content = null;

char[] chars = new char[(int) read.length()];
reader.read(chars);
content = new String(chars);

String[] data = content.split(System.getProperty("line.separator"));

reader.close();

如果您有其他方法（甚至更好），请在下面提供。此外，如有必要，请随时调整我的代码。

回答：

将数据存储在数组中的最快方法是以下方法：

File read = new File("words.txt");
Scanner in = new Scanner(read);

ArrayList<String> temporary = new ArrayList<String>();

while (in.hasNextLine()) {
    temporary.add(in.nextLine());
}

String[] data = temporary.toArray(new String[temporary.size()]);

对于 Java 7+：

Path loc = Paths.get(URI.create("file:///Users/joe/FileTest.txt"));
List<String> lines = Files.readAllLines(loc, Charset.defaultCharset());
String[] array = lines.toArray(new String[lines.size()]);

score 3 · Accepted Answer

我认为最好在这里意味着更快。

我会使用方法 2，但使用Collection 接口提供的方法创建数组：

String[] array = temporary.toArray(new String[temporary.size()]);

甚至更简单（Java 7+）：

List<String> lines = Files.readAllLines(file, charset);
String[] array = lines.toArray(new String[lines.size()]);

其他方法：

方法 1 执行两次，读取文件不太可能比调整数组列表的大小更有效
我不确定方法 3 是否更快

更新：

为了完整起见，我运行了一个微基准测试，上面进行了修改method2，并包括一个额外的方法 ( method4)，它一次读取所有字节，创建一个字符串并在新行上拆分。结果（以 mn 微秒为单位）：

Benchmark   Mean 
method1     126.178
method2     59.679
method3     76.622
method4     75.293

编辑：

使用更大的 3MB 文件： LesMiserables.txt，结果是一致的：

Benchmark      Mean 
method1     608649.322
method2      34167.101
method3      63410.496
method4      65552.79

score 3 · Accepted Answer

这里给出了与所有源代码的一个很好的比较java_tip_how_read_files_quickly

概括：

为了获得最佳的 Java 读取性能，需要记住四件事：

通过一次读取一个数组而不是一次读取一个字节来最小化 I/O 操作。一个 8Kbyte 的数组是一个很好的大小。
通过一次获取一个数组而不是一次获取一个字节来最小化方法调用。使用数组索引来获取数组中的字节。
如果您不需要线程安全，请尽量减少线程同步锁。要么减少对线程安全类的方法调用，要么使用非线程安全类，如 FileChannel 和 MappedByteBuffer。
最大限度地减少 JVM/OS、内部缓冲区和应用程序阵列之间的数据复制。将 FileChannel 与内存映射或直接或包装数组 ByteBuffer 一起使用。

希望有帮助。

编辑

我会这样做：

File read = new File("words.txt");
Scanner in = new Scanner(read);    
List<String> temporary = new LinkedList<String>();

while (in.hasNextLine()) {
    temporary.add(in.nextLine());
}

String[] data = temporary.toArray(new String[temporary.size()]);

主要区别是只读取一次数据（与其他 2 种方法相反），并且在链表中添加非常便宜+ 不需要对需要的行进行额外操作（如拆分） - 此处不要使用 arraylist

score 2 · Accepted Answer

如果您正在从文件中读取数据，瓶颈将是文件读取 (IO) 阶段。在几乎所有情况下，处理它所花费的时间都是微不足道的。所以做正确和安全的事情。首先你做对了；然后你让它快。

如果您不知道文件的大小，则必须具有某种动态扩展的数据结构。这是什么ArrayList。您自己编写的代码不可能比 Java API 中如此重要的部分更有效或更正确。所以只需使用ArrayList：选项2。

score 1 · Accepted Answer

我会用番石榴

File file = new File("words.txt");
List<String> lines = Files.readLines(file, Charset.defaultCharset());
// If it really has to be an array:
String[] array = lines.toArray(new String[0]);

score 1 · Accepted Answer

List<String> lines = Files.readAllLines(yourFile, charset);
String[] arr = lines.toArray(new String[lines.size()]);

java - 在数组中读取文件数据的最快方法（Java）

回答：

5 回答 5

Related

Reference