0

采取以下静态方法:

public static String fileToString(String filename) throws Exception {
        FileInputStream fis = new FileInputStream(filename);
        byte[] buffer = new byte[8192];
        StringBuffer sb = new StringBuffer();   
        int bytesRead; // unused? weird compiler messages...
        while((bytesRead = fis.read(buffer)) != -1) { // InputStream.read() returns -1 at EOF
            sb.append(new String(buffer));
        }
        return new String(sb);
    } 

如您所见,一切看起来都不错,非常适合小型文本文件。但是,一旦您处理包含数千行的大文件,就会遇到重复文本的问题。根据我的直觉,我认为byte[] buffer是“不洁”,可以这么说。所以我在方法中添加了以下行:

buffer = new byte[8192];

所以现在是:

public static String fileToString(String filename) throws Exception {
    FileInputStream fis = new FileInputStream(filename);
    byte[] buffer = new byte[8192];
    StringBuffer sb = new StringBuffer();   
    int bytesRead; // unused? weird compiler messages...
    while((bytesRead = fis.read(buffer)) != -1) { // InputStream.read() returns -1 at EOF
        sb.append(new String(buffer));
        buffer = new byte[8192]; // added new line here
    }
    return new String(sb);
} 

它是完美的,除了在静态方法返回的字符串末尾,我得到很多空字符(取决于缓冲区大小)。这里发生了什么?

4

3 回答 3

1

不要重新发明轮子。如果您不做学校作业,请使用Apache commons IO等现有库。 http://commons.apache.org/io/apidocs/org/apache/commons/io/IOUtils.html#toString%28java.io.InputStream,%20java.nio.charset.Charset%29

例如,您只需几行就可以将文件读入字符串,如下所示:

public static String fileToString(String filepath) throws Exception {
     return IOUtils.toString(new FileInputStream(filepath), "utf-8");
}

这将使您免于大量手写自定义代码,并且可能有更少的错误。

于 2013-02-18T02:24:46.517 回答
1

actually: // unused? weird compiler messages...

is not weird. You never read this.

how could sb.append(new String(buffer)); know how many bytes are written to the buffer.

Exactly, this is where bytesRead comes into play.

So you need new String(bytes, offset, length)

public static String fileToString(String filename) throws Exception {
    FileInputStream fis = new FileInputStream(filename);
    byte[] buffer = new byte[8192];
    StringBuffer sb = new StringBuffer();   
    int bytesRead; // unused? weird compiler messages...
    while((bytesRead = fis.read(buffer)) != -1) { // InputStream.read() returns -1 at EOF
        sb.append(new String(buffer,0,bytesRead));
        buffer = new byte[8192];
        bytesRead=0;
    }
    return new String(sb);
} 

might work

于 2013-02-18T02:00:29.977 回答
1

你真的不应该读取字节并从原始字节创建一个字符串。这是错误的,因为它完全忽略了文本的编码。您可能很幸运并且正在阅读 ASCII,在这种情况下,一切都会顺利进行。在所有其他情况下,这是自找麻烦。

您确实应该使用包装了 InputStreamReader 的 BufferedReader,该 InputStreamReader 包装了您的源 InputStream。

于 2013-02-18T02:04:38.777 回答