0

我正在开发一个读取文本文件并创建报告的程序。报告的内容如下:文件中每个字符串的编号,它的“状态”,以及每个字符串开头的一些符​​号。它适用于高达 100 Mb 的文件。

但是,当我使用大于 1.5Gb 且包含超过 100000 行的输入文件运行程序时,出现以下错误:

> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOfRange(Unknown Source) at
> java.lang.String.<init>(Unknown Source) at
> java.lang.StringBuffer.toString(Unknown Source) at
> java.io.BufferedReader.readLine(Unknown Source) at
> java.io.BufferedReader.readLine(Unknown Source) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:771) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:723) at
> org.apache.commons.io.IOUtils.readLines(IOUtils.java:745) at
> org.apache.commons.io.FileUtils.readLines(FileUtils.java:1512) at
> org.apache.commons.io.FileUtils.readLines(FileUtils.java:1528) at
> org.apache.commons.io.ReadFileToListSample.main(ReadFileToListSample.java:43)

我将 VM 参数增加到 -Xms128m -Xmx1600m (在 Eclipse 运行配置中),但这没有帮助。OTN 论坛的专家建议我阅读一些书籍并提高我的程序性能。有人可以帮我改进吗?谢谢你。

代码:

import org.apache.commons.io.FileUtils;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.LineNumberReader;
import java.io.PrintStream;
import java.util.List;

public class ReadFileToList {

public static void main(String[] args) throws FileNotFoundException
{


File file_out = new File ("D:\\Docs\\test_out.txt");
FileOutputStream fos = new FileOutputStream(file_out); 
PrintStream ps = new PrintStream (fos);
System.setOut (ps);

// Create a file object
File file = new File("D:\\Docs\\test_in.txt");


FileReader fr = null;
LineNumberReader lnr = null; 


try {
// Here we read a file, sample.txt, using FileUtils
// class of commons-io. Using FileUtils.readLines()
// we can read file content line by line and return
// the result as a List of string.

List<String> contents = FileUtils.readLines(file);
//
// Iterate the result to print each line of the file.


fr = new FileReader(file); 
lnr = new LineNumberReader(fr); 

for (String line : contents)
{
String begin_line = line.substring(0, 38); // return 38 chars from the string
String begin_line_without_null = begin_line.replace("\u0000", " ");
String begin_line_without_null_spaces = begin_line_without_null.replaceAll(" +", " "); 

int stringlenght = line.length();
line = lnr.readLine(); 
int line_num = lnr.getLineNumber();

String status;

// some correct length for if
int c_u_length_f = 12;
int c_ea_length_f = 13;
int c_a_length_f = 2130;
int c_u_length_e = 3430;
int c_ea_length_e = 1331;
int c_a_length_e = 442;
int h_ext = 6;
int t_ext = 6;


if ( stringlenght == c_u_length_f ||
stringlenght == c_ea_length_f ||
stringlenght == c_a_length_f ||
stringlenght == c_u_length_e ||
stringlenght == c_ea_length_e ||
stringlenght == c_a_length_e ||
stringlenght == h_ext ||
stringlenght == t_ext)
status = "ok";
else status = "fail";



System.out.println(+ line_num + stringlenght + status + begin_line_without_null_spaces);


}
} catch (IOException e) {
e.printStackTrace();
}
}
}

OTN 的专家还表示,该程序会打开输入并读取两次。“for语句”中可能有一些错误?但我找不到它。谢谢你。

4

1 回答 1

1

您在循环内声明变量并做了很多不需要的工作,包括两次读取文件 - 也不利于性能。您可以使用行号阅读器来获取行号和文本并重用行变量(在循环外声明)。这是一个缩短的版本,可以满足您的需求。您需要完成 validLength 方法来检查所有值,因为我只包含了前几个测试。

import java.io.*;

public class TestFile {

//a method to determine if the length is valid implemented outside the method that does the reading
    private static String validLength(int length) {
        if (length == 12 || length == 13 || length == 2130) //you can finish it
            return "ok";
        return "fail";
    }

    public static void main(String[] args) {
        try {
            LineNumberReader lnr = new LineNumberReader(new FileReader(args[0]));
            BufferedWriter out = new BufferedWriter(new FileWriter(args[1]));
            String line;
            int length;
            while (null != (line = lnr.readLine())) {
                length = line.length();
                line = line.substring(0,38);
                line = line.replace("\u0000", " ");
                line = line.replace("+", " ");
                out.write( lnr.getLineNumber() + length + validLength(length) + line);
                out.newLine();
            }
            out.close();
        }
        catch (Exception e) {
            e.printStackTrace();
        }
    }
}

将其称为 java TestFile D:\Docs\test_in.txt D:\Docs\test_in.txt 或将 args[0] 和 args[1] 替换为文件名,如果您想对其进行硬编码。

于 2012-03-26T16:28:27.227 回答