4

My Java program uses java.util.concurrent.Executor to run multiple threads, each one starts a runnable class, in that class it reads from a comma delimited text file on C: drive and loops through the lines to split and parse text into floats, after that data is stored into :

static Vector
static ConcurrentSkipListMap

My PC is a Win 7 64bit, Intel Core i7, has six * 2 cores and 24GB of RAM, I have noticed the program will run for 2 minutes and finish all 1700 files, but the CPU usage is only around 10% to 15%, no matter how many threads I assign using :

Executor executor=Executors.newFixedThreadPool(50);

Executors.newFixedThreadPool(500) won't have a better CPU usage or shorter time to finish the tasks. There is no network traffic, everything is on local C: drive, There is enough RAM for more threads to use, it will have an "OutOfMemoryError" when I increase the threads to 1000.

How come more threads doesn't translate to more CPU usage and less time of processing, why ?

Edit : My hard drive is a SSD 200 GB.

Edit : Finally found where the problem was, each thread writes it's results to a log file which is shared by all threads, the more times I run the app, the larger the log file, the slower it gets, and since it's shared, this definitely slows down the process, so after I stopped writing to the log file, it finishes all tasks in 10 seconds !

4

3 回答 3

4

The OutOfMemoryError is probably coming from Java's own limits on its memory usage. Try using some of the arguments here to increase the maximum memory.

For speed, Adam Bliss starts with a good suggestion. If this is the same file over and over, then I imagine having multiple threads try to read it at the same time could result in a lot of contention over locks on the file. More threads would even mean more contention, which could even result in worse overall performance. So avoid that and simply load the file once if it's possible. Even if it's a large file, you have 24 GB of RAM. You can hold quite a large file, but you may need to increase the JVM's allowed memory to allow the whole file to be loaded.

If there are multiple files being used, then consider this fact: your disk can only read one file at a time. So having multiple threads trying to use the disk all at the same time probably won't be too effective if the threads aren't spending much time processing. Since you have so little CPU usage, it could be that the thread loads part of the file, then runs very quickly on the part that got buffered, and then spends a lot of time waiting for the rest of the file to load. If you're loading the file over and over, that could even still apply.

In short: Disk IO probably is your culprit. You need to work to reduce it so that the threads aren't contending for file content so much.

Edit:

After further consideration, it's more likely a synchronization issue. Threads are probably getting held up trying to add to the result list. If access is frequent, this will result in huge amounts of contention for locks on the object. Consider doing something like having each thread save it's results in a local list (like ArrayList, which is not thread safe), and then copying all values into the final, shared list in chunks to try to reduce contention.

于 2013-07-24T03:35:59.623 回答
1

You're probably being limited by IO, not cpu.

Can you reduce the number of times you open the file to read it? Maybe open it once, read all the lines, keep them in memory, and then iterate on that.

Otherwise, you'll have to look at getting a faster hard drive. SSDs can be quite speedy.

于 2013-07-24T03:14:42.760 回答
1

It is possible that your threads are somehow given low priority on the system? Increasing the number of threads in that case wouldn't correspond to an increase in CPU usage, since the amount of CPU space allotted to your program may be throttled somewhere else.

Are there any configuration files/ initialization steps where something like this could possibly occur?

于 2013-07-24T03:15:28.407 回答