0

I have this text file with this format

int | string | string | string |
int | string | string | string |
int | string | string | string |
.
.
.

Size of this file is about 80 MB. I have to read this file and after some evaluation add it to the database.

What I do is that I read one line and based on some condition I add them to the database. But this code is taking so long. It's been literally more than a day that I ran this code and no result yet!

What can I do to make it faster.

I know there should be some way to read the whole file at once.

BTW I'm using mysql

Help me out guys!

Here is my code

public void fill_names_db() throws Exception{

    MySQLAccess dao = new MySQLAccess();   
    Scanner stringScanner;



    BufferedReader in = new BufferedReader(new FileReader("C:\\Users\\havij\\Downloads\\taxdump\\names.dmp"));

    String tax_id;
    String name_txt;
    String unique_name;
    String name_class;
    Connection connect=null;
    connect = dao.newConnection();



    while (in.ready()) {


        String s = in.readLine();
        //System.out.println(s);
        stringScanner = new Scanner(s).useDelimiter("\t|\t");

        tax_id = stringScanner.next();
        stringScanner.next();
        name_txt = stringScanner.next();
        stringScanner.next();
        unique_name = stringScanner.next();
        stringScanner.next();
        name_class = stringScanner.next();



        if(name_class.equals("scientific name"))
            dao.insertToDB(connect, "id_to_name", tax_id.toString(), name_txt);

        if(dao.hasKey(connect,"name_to_id",name_txt))
            if (!unique_name.isEmpty())
                dao.insertToDB(connect, "name_to_id",unique_name,tax_id.toString(),name_txt,unique_name, name_class );

        else if(!name_txt.isEmpty())
            dao.insertToDB(connect, "name_to_id",name_txt,tax_id.toString(),name_txt,unique_name, name_class );


    }
    dao.close(connect);
    in.close();
    System.out.println("done");

    }
4

3 回答 3

4

该文件已经为使用 MySQL LOAD DATA INFILE 指令格式化。你可以在这里阅读:http: //dev.mysql.com/doc/refman/4.1/en/load-data.html

你只需要使用'|' 作为您的字段分隔符和 \n 作为您的行分隔符。

不要忘记 LOCAL 关键字,因为该文件可能位于 SQL 客户端的文件系统上。

于 2013-07-15T19:55:45.673 回答
2

在这种情况下,您不会想要 Java 的开销。您想使用所谓的加载数据文件

从这篇文章

mysql> create table t2 (a varchar(20), b varchar(20), c varchar(20));
Query OK, 0 rows affected (0.01 sec)

mysql> load data infile '/tmp/data.csv' into table t2 fields terminated by ','   
       enclosed by '"' lines terminated by '\n' (a, b, c);

安全

于 2013-07-15T19:56:47.990 回答
1

正如其他人指出的那样,使用加载数据文件将更容易完成您的任务。如果您坚持在 Java 中这样做,请尝试使用BufferedReader允许您指定缓冲区大小的构造函数,例如

// specify 128K buffer, default is 8K
// You can try larger values, it really depends on your disk I/O
BufferedReader in = new BufferedReader(new FileReader("C:\\Users\\havij\\Downloads\\taxdump\\names.dmp"), 128 * 1024); 

请注意另一个问题:FileReader如果文件包含 UTF8 字符,您使用的方法可能会损坏您的数据。最好使用InputStreamReader并指定文件使用的字符集

于 2013-07-15T20:20:08.370 回答