2

I'm trying to perform a once-through read of a large file (~4GB) using Java 5.0 x64 (on Windows XP).

Initially the file read rate is very fast, but gradually the throughput slows down substantially, and my machine seems very unresponsive as time goes on.

I've used ProcessExplorer to monitor the File I/O statistics, and it looks like the process initially reads 500MB/sec, but this rate gradually drops to around 20MB/sec.

Any ideas on the the best way to maintain File I/O rates, especially with reading large files using Java?

Here's some test code that shows the "interval time" continuing to increase. Just pass Main a file that's at least 500MB.

import java.io.File;
import java.io.RandomAccessFile;

public class MultiFileReader {

public static void main(String[] args) throws Exception {
    MultiFileReader mfr = new MultiFileReader();
    mfr.go(new File(args[0]));
}

public void go(final File file) throws Exception {
    RandomAccessFile raf = new RandomAccessFile(file, "r");
    long fileLength = raf.length();
    System.out.println("fileLen: " + fileLength);
    raf.close();

    long startTime = System.currentTimeMillis();
    doChunk(0, file, 0, fileLength);
    System.out.println((System.currentTimeMillis() - startTime) + " ms");
}

public void doChunk(int threadNum, File file, long start, long end) throws Exception {
    System.out.println("Starting partition " + start + " to " + end);
    RandomAccessFile raf = new RandomAccessFile(file, "r");
    raf.seek(start);

    long cur = start;
    byte buf[] = new byte[1000];
    int lastPercentPrinted = 0;
    long intervalStartTime = System.currentTimeMillis();
    while (true) {
        int numRead = raf.read(buf);
        if (numRead == -1) {
            break;
        }
        cur += numRead;
        if (cur >= end) {
            break;
        }

        int percentDone = (int)(100.0 * (cur - start) / (end - start));
        if (percentDone % 5 == 0) {
            if (lastPercentPrinted != percentDone) {
                lastPercentPrinted = percentDone;
                System.out.println("Thread" + threadNum + " Percent done: " + percentDone + " Interval time: " + (System.currentTimeMillis() - intervalStartTime));
                intervalStartTime = System.currentTimeMillis();
            }
        }
    }
    raf.close();
}
}

Thanks!

4

5 回答 5

10

I very much doubt that you're really getting 500MB per second from your disk. Chances are the data is cached by the operating system - and that the 20MB per second is what happens when it really hits the disk.

This will quite possibly be visible in the disk section of the Vista Resource Manager - and a low-tech way to tell is to listen to the disk drive :)

于 2008-12-04T21:37:27.110 回答
1

Java 垃圾收集器可能是这里的瓶颈。

我会让缓冲区更大并且对类来说是私有的,这样它就可以被重用,而不是在每次调用 doChunk() 时分配。

public class MultiFileReader {

   private byte buf[] = new byte[256*1024];

   ...

}
于 2008-12-05T15:23:10.047 回答
1

根据您的特定硬件和其他情况,您可能需要相当努力地工作以实现超过 20MB/秒的速度。

我想也许你真的不知道 500MB/秒是多么的不合时宜……

您希望什么,您是否检查过您的特定驱动器在理论上是否能够实现?

于 2008-12-04T21:58:55.880 回答
0

You could use JConsole to monitor your app, including memory usage. The 500 MB/sec sounds to good to be true.

Some more information about the implementation and VM arguments used would be helpful.

于 2008-12-04T21:43:06.600 回答
0

检查 static void read3() throws IOException {

        // read from the file with buffering
        // and with direct access to the buffer

        MyTimer mt = new MyTimer();
        FileInputStream fis = 
                     new FileInputStream(TESTFILE);
        cnt3 = 0;
        final int BUFSIZE = 1024;
        byte buf[] = new byte[BUFSIZE];
        int len;
        while ((len = fis.read(buf)) != -1) {
            for (int i = 0; i < len; i++) {
                if (buf[i] == 'A') {
                    cnt3++;
                }
            }
        }
        fis.close();
        System.out.println("read3 time = " 
                                + mt.getElapsed());
    }

来自http://java.sun.com/developer/JDCTechTips/2002/tt0305.html

最佳缓冲区大小可能取决于操作系统。你的可能很小。

于 2008-12-05T14:55:39.477 回答