2

我必须逐行读取日志文件。它的大小约为 6MB,总共 40000 行。但是在测试我的程序后,我发现该日志文件仅由 LF 字符分隔。所以我不能使用类的Readline方法StreamReader

我该如何解决这个问题?

编辑:我尝试使用文本阅读器,但我的程序仍然无法正常工作:

using (TextReader sr = new StreamReader(strPath, Encoding.Unicode))
            {


                sr.ReadLine(); //ignore three first lines of log file
                sr.ReadLine(); 
                sr.ReadLine();

                int count = 0; //number of read line
                string strLine;
                while (sr.Peek()!=0)
                {
                    strLine = sr.ReadLine();
                    if (strLine.Trim() != "")
                    {
                        InsertData(strLine);
                        count++;
                    }
                }

                return count;
            }
4

4 回答 4

9

TextReader.ReadLine already handles lines terminated just by \n.

From the docs:

A line is defined as a sequence of characters followed by a carriage return (0x000d), a line feed (0x000a), a carriage return followed by a line feed, Environment.NewLine, or the end of stream marker. The string that is returned does not contain the terminating carriage return and/or line feed. The returned value is a null reference (Nothing in Visual Basic) if the end of the input stream has been reached.

So basically, you should be fine. (I've talked about TextReader rather than StreamReader because that's where the method is declared - obviously it will still work with a StreamReader.)

If you want to iterate through lines easily (and potentially use LINQ against the log file) you may find my LineReader class in MiscUtil useful. It basically wraps calls to ReadLine() in an iterator. So for instance, you can do:

var query = from file in Directory.GetFiles("logs")
            from line in new LineReader(file)
            where !line.StartsWith("DEBUG")
            select line;

foreach (string line in query)
{
    // ...
}

All streaming :)

于 2009-07-17T08:44:30.857 回答
3

File.ReadAllLines(fileName) 是否无法正确加载 LF 行结束的文件?如果您需要整个文件,请使用它 - 我看到一个站点表明它比另一种方法慢,但如果您将正确的编码传递给它(默认为 UTF-8),则它不是,而且它尽可能干净。

编辑:确实如此。如果您需要流式传输,TextReader.ReadLine() 也可以正确处理 Unix 行尾。

Edit again: So does StreamReader. Did you just check the documentation and assume it won't handle LF line ends? I'm looking in Reflector and it sure seems like a proper handling routine.

于 2009-07-17T08:36:51.010 回答
0

I'd have guessed \LF (\n) would be fine (whereas \CR (\r) -only might cause problems).

You could read each line a character at a time and process it when you read the terminator.

After profiling, if this is too slow, then you could use app-side-buffering with read([]). But try simple character-at-a-time first!

于 2009-07-17T08:49:09.457 回答
0

Or you can use the Readblock Method and parse the lines yourself

于 2009-07-17T10:57:29.887 回答