0

I have a very huge csv file and I have to use some select query, getting avg,... I can not do that normally by reading line by line, because of out of memory.

the following code work well on a short csv file but not for huge one. I will appreciate if you can edit this code to use for large csv file.

import java.io.File;

import java.io.FileNotFoundException;
import java.util.Scanner;


public class Mu {
    public void Computemu()
    {
        String filename="testdata.csv";
        File file=new File(filename);
        try {
            Scanner inputstream=new Scanner(file);//Scanner read only string 
            // String data=inputstream.next();//Ignore the first line(header)
            double sum=0;
            double numberOfRating=0;

            while (inputstream.hasNext())
            {                       
               String data=inputstream.next();//get a whole line
                String[] values= data.split(";");//values separate by;
                double rating=Double.parseDouble(values[2].replaceAll("\"", ""));//change value to string
                if(rating>0)//do not consider implicit ratings
                {
                    sum+=rating;
                    numberOfRating++;
                }
            }
            inputstream.close();
            System.out.println("Mu is"+ (sum/numberOfRating));
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }
}
4

2 回答 2

2

您没有调用useDelimiter,因此next()如果没有空格(默认分隔符),这些方法必须将整个文件加载到字符串中。

这会导致 OutOfMemory 错误。

如果要使用 Scanner,请根据需要设置分隔符。

但是 CSV 库(如csvfile可能会更有效。

于 2012-07-03T19:35:02.753 回答
0

对于这个用例,我建议使用 Apache Commons FileUtil。这可能不是您在问题中寻找的内容,但 FileUtil 的使用比重新实现它更可取。

具体请看lineIterator方法。

于 2012-07-03T19:29:58.870 回答