我认为从本地文件读取的输入流对象与从网络源(在本例中为 Amazon S3)读取的输入流对象之间没有区别,因此希望有人能启发我。
这些程序在运行 Centos 6.3 的 VM 上运行。两种情况下的测试文件都是 10MB。
本地文件代码:
InputStream is = new FileInputStream("/home/anyuser/test.jpg");
int read = 0;
int buf_size = 1024 * 1024 * 2;
byte[] buf = new byte[buf_size];
ByteArrayOutputStream baos = new ByteArrayOutputStream(buf_size);
long t3 = System.currentTimeMillis();
int i = 0;
while ((read = is.read(buf)) != -1) {
baos.write(buf,0,read);
System.out.println("reading for the " + i + "th time");
i++;
}
long t4 = System.currentTimeMillis();
System.out.println("Time to read = " + (t4-t3) + "ms");
这段代码的输出是这样的:它读取了 5 次,这是有道理的,因为读入的缓冲区大小是 2MB,而文件是 10MB。
reading for the 0th time
reading for the 1th time
reading for the 2th time
reading for the 3th time
reading for the 4th time
Time to read = 103ms
现在,我们使用相同的 10MB 测试文件运行相同的代码,除了这一次,源来自 Amazon S3。在我们完成从 S3 获取流之前,我们不会开始阅读。但是,这一次,读取循环运行了数千次,而它应该只读取 5 次。
InputStream is;
long t1 = System.currentTimeMillis();
is = getS3().getFileFromBucket(S3Path,input);
long t2 = System.currentTimeMillis();
System.out.print("Time to get file " + input + " from S3: ");
System.out.println((t2-t1) + "ms");
int read = 0;
int buf_size = 1024*1024*2;
byte[] buf = new byte[buf_size];
ByteArrayOutputStream baos = new ByteArrayOutputStream(buf_size);
long t3 = System.currentTimeMillis();
int i = 0;
while ((read = is.read(buf)) != -1) {
baos.write(buf,0,read);
if ((i % 100) == 0)
System.out.println("reading for the " + i + "th time");
i++;
}
long t4 = System.currentTimeMillis();
System.out.println("Time to read = " + (t4-t3) + "ms");
输出如下:
Time to get file test.jpg from S3: 2456ms
reading for the 0th time
reading for the 100th time
reading for the 200th time
reading for the 300th time
reading for the 400th time
reading for the 500th time
reading for the 600th time
reading for the 700th time
reading for the 800th time
reading for the 900th time
reading for the 1000th time
reading for the 1100th time
reading for the 1200th time
reading for the 1300th time
reading for the 1400th time
Time to read = 14471ms
读取流所需的时间从运行到运行变化。有时需要 60 秒,有时需要 15 秒。它不会超过 15 秒。在程序的每次测试运行中,读取循环仍然循环 1400 多次,即使我认为它应该只有 5 次,就像本地文件示例一样。
即使我们已经从网络源获取文件,当源通过网络时输入流是这样工作的吗?在此先感谢您的帮助。