public static void mergeAllFilesJavolution()throws FileNotFoundException, IOException {
String fileDir = "C:\\TestData\\w12";
File dirSrc = new File(fileDir);
File[] list = dirSrc.listFiles();
long start = System.currentTimeMillis();
for(int j=0; j<list.length; j++){
int chr;
String srcFile = list[j].getPath();
String outFile = fileDir + "\\..\\merged.txt";
UTF8StreamReader inFile=new UTF8StreamReader().setInput(new FileInputStream(srcFile));
UTF8StreamWriter outPut=new UTF8StreamWriter().setOutput(new FileOutputStream(outFile, true));
while((chr=inFile.read()) != -1) {
utf-8 文件的文件大小为 200MB 作为测试数据,但很有可能达到 800MB。
这是 UTF8StreamReader.read() 源代码。
* Holds the bytes buffer.
private final byte[] _bytes;
* Creates a UTF-8 reader having a byte buffer of moderate capacity (2048).
public UTF8StreamReader() {
_bytes = new byte[2048];
* Reads a single character. This method will block until a character is
* available, an I/O error occurs or the end of the stream is reached.
* @return the 31-bits Unicode of the character read, or -1 if the end of
* the stream has been reached.
* @throws IOException if an I/O error occurs.
public int read() throws IOException {
byte b = _bytes[_start];
return ((b >= 0) && (_start++ < _end)) ? b : read2();
错误发生在 _bytes[_start],因为 _bytes = new byte[2048]。
这是另一个 UTF8StreamReader 构造函数:
* Creates a UTF-8 reader having a byte buffer of specified capacity.
* @param capacity the capacity of the byte buffer.
public UTF8StreamReader(int capacity) {
_bytes = new byte[capacity];
问题:如何在创建UTF8StreamReader 时指定 _bytes 的正确容量?
我尝试了 File.length()但它返回 long 类型(我认为它是正确的,因为我期待巨大的文件大小但构造函数只接收 int 类型)。