3

I'm new to java ...in my current project I need to read and write a very huge text file (1 GB - 5 GB) ... first i used this classes : BufferedReader and BufferedWriter

public static String read(String dir) {
    BufferedReader br;
    String result = "", line;
    try {
        br = new BufferedReader(new InputStreamReader(new FileInputStream(dir), "UTF-8"));
        while ((line = br.readLine()) != null) {
            result += line + "\n";
        }
    } catch (IOException ex) {
        //do something
    }
    return result;
}

public static void write(String dir, String text) {
    BufferedWriter bw;
    try {
        bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(dir), "UTF-8"));
        bw.write("");
        for (int i = 0; i < text.length(); i++) {
            if (text.charAt(i) != '\n') {
                bw.append(text.charAt(i));
            } else {
                bw.newLine();
            }
        }
        bw.flush();
    } catch (IOException ex) {
        //do something
    }
}

this classes works very good but not for Huge files...

then I used MappedByteBuffer for the read() method (I dont't know how to write a file using this class) :

public static String read(String dir) {
    FileChannel fc;
    String s = "";
    try {
        fc = new RandomAccessFile(dir, "r").getChannel();
        MappedByteBuffer buffer = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
        buffer.load();
        buffer.force();
        for (int i = 0; i < buffer.limit(); i++) {
            s += (char) buffer.get();
        } //I know the problem is here
        buffer.clear();
        inChannel.close();
    } catch (IOException e) {
        //do something
    }
    return s;
}

But still can't read large files(over 30-40 MB), even the NotePad is faster than my app :))

and also another problem is I don't know how to change encoding in second way(for example "UTF-8", "ANSI",...)

so guys, please tell me which is the best way to read and write laaaarge files? any idea?

4

4 回答 4

2
result += line + "\n";

这一行试图将整个文件内容保存在内存中。尝试像这样阅读每一行

while ((line = br.readLine()) != null) {
            processLine( line ); // this may write it to another file.
        }
于 2014-03-18T14:18:06.297 回答
1

至少,我建议改变

result += line + "\n";

到 StringBuilder。

resultBldr.append(line).append("\n");

这避免了在每一行上创建一个新的字符串对象——一个越来越大、越来越大的字符串对象!

此外,您绝对应该将输出逐行写入文件。不要累积所有文本然后输出它。

换句话说,在这种情况下,不建议完全分离 yourread和函数。write

于 2014-03-18T14:13:31.707 回答
0

认为字符串的每个连接都会创建一个新字符串,因此,如果您读取 40 MB 大文件的每个字符并连接,您总共会创建 40.000.000 个字符串read()

尝试使用StringBuffer代替String,这对于这种情况是可推荐的。

于 2014-03-18T14:13:13.563 回答
0

一次读取 1GB - 5GB 范围内的大文件总是一个坏主意。头顶会有巨大的性能,你的应用程序会变慢。

最好将这个巨大的文件分成更小的块并逐块读取。我认为如果您开始以较小的块读取文件,那么您编写的代码将可以正常工作。

你听说过专门用于处理海量数据的 HDFS 系统、Solr 索引、apache hadoop 框架吗?你可能想看看它。

于 2014-03-18T14:19:30.720 回答