3

我注意到在使用ReadLine()orStreamReaderStringReader,如果文件或字符串以换行符结尾,则该字符序列将完全丢失。考虑以下示例:

static void Main(string[] args)
{
    string data = "First Line\r\nSecond Line\r\n\r\n\r\n";
    List<string> lineData = new List<string>();
    string[] splitData = data.Split(
        new string[] { "\r\n" }, 
        StringSplitOptions.None);

    using (StringReader sr = new StringReader(data))
    {
        string line;
        while ((line = sr.ReadLine()) != null)
            lineData.Add(line);
    }

    Console.WriteLine("Raw Line Count: " + splitData.Length);
    Console.WriteLine("StringReader Line Count: " + lineData.Count);
    Console.WriteLine("Split Data: ");
    foreach (string s in splitData)
        Console.WriteLine(string.IsNullOrEmpty(s) ? "[blank line]" : s);
    Console.WriteLine("StringReader Data: ");
    foreach (string s in lineData)
        Console.WriteLine(string.IsNullOrEmpty(s) ? "[blank line]" : s);
    Console.ReadKey();
}

输出如下:

Raw Line Count: 5
StringReader Line Count: 4
Split Data:
First Line
Second Line
[blank line]
[blank line]
[blank line]
StringReader Data:
First Line
Second Line
[blank line]
[blank line]

为什么StringReader/会StreamReader这样?我可以想到几种解决方法,但是因为读者的行为方式出乎意料而不得不重新编写我的代码似乎很愚蠢。某些 .NET 库中是否有一些设置会影响流处理最终换行符的方式?

编辑

这是另一个示例:比较先运行该示例时的结果"First Line\r\nSecond Line",然后再对"First Line\r\nSecond Line\r\n". 结果完全相同(就示例的 StringReader 部分而言)。为什么 StringReadernull在第二个示例中会返回而不是空字符串?我知道从返回的字符串ReadLine()不包括换行符,但为什么最后一行被解释为null而不是""

4

3 回答 3

3

The difference in your output is not because a strange behaviour of the StringReader. Note that your input contains only four lines, and exactly four lines are being read (only without the ending newline tokens, as specified by the documentation). It's the Split method which introduces an extra line - because if you've wanted to keep empty entries a non-existent entry is created after the last token.

Output of StringReader:

"First Line\r\nSecond Line\r\n\r\n\r\n";
 ^1st          ^2nd           ^3rd^4th   (line)

Output of Split:

"First Line\r\nSecond Line\r\n\r\n\r\n";
 ^1st          ^2nd           ^3rd^4th^5th (token)

Consider this input:

"First line\r\n"

How many lines is it? One, and that's the output:

Split Data:
First Line
[blank line]
StringReader Data:
First Line

So it seems that it's the Split that is the "problem" (if there is any) here.

The real problem was described by Douglas in the comments below, and it is that inputs like "ABC\r\nXYZ" and "ABC\r\nXYZ\r\n" are indistinguishable. However, in typical use cases for ReadLine interface you don't care about that. If you want to care, you need to use an interface on a level that is a bit lower (e.g. Read).

于 2013-10-14T17:15:01.237 回答
2

这是预期的行为并记录在案。来自 - http://msdn.microsoft.com/en-us/library/system.io.stringreader.readline.aspx

行定义为字符序列后跟换行符 ("\n")、回车符 ("\r") 或回车符后紧跟换行符 ("\r\n")。返回的字符串不包含终止的回车符或换行符。如果已到达字符串的末尾,则返回值为 null。

这意味着返回的最后一个值为空,它将省略最后一个换行符。如果需要在读取的数据中显示,可以通过 uisng 重新申请Environment.NewLine

于 2013-10-14T17:12:01.537 回答
2

Per docs on ReadLine:

A line is defined as a sequence of characters followed by a line feed ("\n"), a carriage return ("\r"), or a carriage return immediately followed by a line feed ("\r\n"). The string that is returned does not contain the terminating carriage return or line feed. The returned value is null if the end of the input stream is reached.

You're using a method that relies on Environment.NewLine to tokenize the input stream and return the result. Since those tokens are excluded from the result, it would stand to reason that the expected behavior is what you're seeing.

If you need those characters, you're better off reading the file in chunks (using a standard Read with a buffer) and break out the content yourself. Alternatively you could create your own implementation of a Stream that performs the task as you wish.

于 2013-10-14T17:14:12.877 回答