7

我经常使用 Scanner 类来读取文件,因为它非常方便。

      String inputFileName;
      Scanner fileScanner;

      inputFileName = "input.txt";
      fileScanner = new Scanner (new File(inputFileName));

我的问题是,上述语句是否一次将整个文件加载到内存中?或者对 fileScanner 进行后续调用,例如

      fileScanner.nextLine();

从文件中读取(即从外部存储而不是从内存中)?我问是因为我担心如果文件太大而无法一次全部读入内存会发生什么。谢谢。

4

4 回答 4

16

如果您阅读源代码,您可以自己回答问题。

似乎有问题的 Scanner 构造函数的实现显示:

public Scanner(File source) throws FileNotFoundException {
        this((ReadableByteChannel)(new FileInputStream(source).getChannel()));
}

后者被包装到一个 Reader 中:

private static Readable makeReadable(ReadableByteChannel source, CharsetDecoder dec) {
    return Channels.newReader(source, dec, -1);
}

并使用缓冲区大小读取

private static final int BUFFER_SIZE = 1024; // change to 1024;

正如您在构造链中的最终构造函数中看到的那样:

private Scanner(Readable source, Pattern pattern) {
        assert source != null : "source should not be null";
        assert pattern != null : "pattern should not be null";
        this.source = source;
        delimPattern = pattern;
        buf = CharBuffer.allocate(BUFFER_SIZE);
        buf.limit(0);
        matcher = delimPattern.matcher(buf);
        matcher.useTransparentBounds(true);
        matcher.useAnchoringBounds(false);
        useLocale(Locale.getDefault(Locale.Category.FORMAT));
    }

因此,扫描仪似乎不会一次读取整个文件。

于 2012-04-26T15:32:57.597 回答
2

通过阅读代码,默认情况下似乎一次加载 1 KB。对于长文本行,缓冲区的大小可能会增加。(到您拥有的最长文本行的大小)

于 2012-04-26T15:32:26.933 回答
1

In ACM Contest the fast read is very important. In Java we found found that use something like that is very faster...

    FileInputStream inputStream = new FileInputStream("input.txt");
    InputStreamReader streamReader = new InputStreamReader(inputStream, "UTF-8");
    BufferedReader in = new BufferedReader(streamReader);
    Map<String, Integer> map = new HashMap<String, Integer>();
    int trees = 0;
    for (String s; (s = in.readLine()) != null; trees++) {
        Integer n = map.get(s);
        if (n != null) {
            map.put(s, n + 1);
        } else {
            map.put(s, 1);
        }
    }

The file contains, in that case, tree names...

Red Alder
Ash
Aspen
Basswood
Ash
Beech
Yellow Birch
Ash
Cherry
Cottonwood

You can use the StringTokenizer for catch any part of line that your want.

We have some errors if we use Scanner for large files. Read 100 lines from a file with 10000 lines!

A scanner can read text from any object which implements the Readable interface. If an invocation of the underlying readable's Readable.read(java.nio.CharBuffer) method throws an IOException then the scanner assumes that the end of the input has been reached. The most recent IOException thrown by the underlying readable can be retrieved via the ioException() method.

tells in the API

Good luck!

于 2012-04-26T15:53:07.183 回答
0

对于大文件,最好使用带有FileReader的BufferedReader之类的东西。一个基本的例子可以在这里找到。

于 2012-04-26T15:26:32.447 回答