3

如何加速 Java 应用程序?

我正在开发一个 Java 应用程序,它逐行解析 Cobol 文件,从中提取必要的数据并填充到 DB2 数据库中。

如果要解析的文件更多,则应用程序需要超过 24 小时才能完成,这是不可接受的。

所以我在一个单独的线程中做了一些表格填充以加快速度..eg

ArrayList list = (ArrayList)vList.clone();
ThreadPopulator populator = new ThreadPopulator(connection, list, srcMbr);
Thread thread = new Thread(populator);
thread.run();
return;


And ThreadPopulator class is implementing Runnable interface and run method as

public void run()
{
    try
    {
        synchronized (this)
        {
           int len = Utils.length(list);
           for (int i = 0; i < len; i++)
           {
              .....
              stmt.addBatch();
            if ((i + 1) % 5000 == 0)
                    stmt.executeBatch(); // Execute every 5000 items.
           }
        }
    }
    catch (Throwable e)
    {
        e.printStackTrace():
    }
    finally
    {
        if (list != null)
            list.clear();
    }
}

注意:需要使用克隆,这样下一个线程就不会消失条目。

我的想法是否正确?

请建议我,我必须选择什么方式来加速我的应用程序超过数千个 Cobol 文件。

4

2 回答 2

7

You need to first determine what is it spending most of it's time doing. This requires measuring the CPU and possibly memory usage. Is it the parsing which is using CPU, or the database which is using IO.

Without measuring what is your performance bottleneck, you can't make an informed decision as to what need to be improved.

From my experience, I would suspect the database first. You have batch sizes of 5000 which should be enough. How much CPU is it using when the program is running e.g. is one CPU always busy?

Note: You can write a simple text parser to read about 40-100 MB/s. To run for 24 hours you would need to have many TB of data to load which sounds unlikely to be the cause.

Actually first need to rewrite the file in proper format then read those lines & extract necessary data, even source lines read by 2-3 times for a single file, (actually this is logic part). When I run the application on 4000K files, it runs for 24 hrs.

4 million files is going to be a performance problem. Even a trivial file open takes about 8 ms for a fast HDD and if you open it 2-3 times each it will take about 30 hours in total. (I assume your disk cache saves you a few hours) The only way to make it faster is to;

  • use less files. 4 million is an insane number to open multiple times. Opening them just once each will take about 10 hours (never mind doing something with them)
  • use a faster drive e.g. an SSD can do this in about 1/100th of the time. an HDD can perform up to 120 IOPS, an cheap SSD can do 40,000 IOPS and a good one 230,000 IOPS. The later could open 4 million files in ~12 seconds which is faster than 10 hours. ;)
  • pass all the files only once. It will still be slow, but it will be 2-3x faster.

Note: using more threads won't make your hard drives go faster.

于 2013-01-24T14:03:37.067 回答
1

你在打电话

thread.run();

代替

thread.start();

这意味着您实际上并没有在单独的线程中运行代码...

除此之外,我想支持@Peter 的回答。

于 2013-01-24T14:08:37.670 回答