1

I have several log files that I need to parse and combine based on a timestamp. They're of the format:

GaRbAgE fIrSt LiNe
[1124 0905 134242422       ] Logs initialized
[1124 0905 134242568 SYSTEM] Good log entry:
{ Collection:
  ["Attribute"|String]
...
[1124 0905 135212932 SYSTEM] Good log entry:

As you can see I don't need the first line.
I'm currently using some Regex to parse each file: one expression determines if I have a "Logs initialized" line, which I don't care about and discard; another determines if I have a "Good log entry", which I keep and parse; and some of the good log entries span multiple lines. I simply accept the logs that are on multiple lines. However, the code currently also captures the first garbage line because it is indistinguishable from a multi-line log comment from a Regex viewpoint. Furthermore, from what I read Regex is not the solution here (Parsing a log file with regular expressions).

There are many log files and they can grow to be rather large. For this reason, I'm only reading 50 lines at a time per log before buffering and then combining them into a separate file. I loop through every file as long as there are non-null files left. Below is a code example where I replaced some conditions and variables with explanations.

while (there are non-null files left to read)
     {
        foreach (object logFile in logFiles) //logFiles is an array that stores the log names
        {
           int numLinesRead = 0;
           using (StreamReader fileReader = File.OpenText(logFile.ToString()))
           {
              string fileLine;
              // read in a line from the file
              while ((fileLine = fileReader.ReadLine()) != null && numLinesRead < 50)
              {
                 // compare line to regex expressions
                 Match rMatch = rExp.Match(fileLine);
                 if (rMatch.Success)  // found good log entry
                 {
                 ...

How would you skip that first garbage line? Unfortunately it is not as easy as simply consuming a line with ReadLine() because the StreamReader is within a loop and I'll end up deleting a line every 50 others.
I thought of keeping a list or array of files for which I've skipped that first line already (in order to not skip it more than once) but that is sort of ugly. I also thought of getting rid of the using statement and opening the StreamReader up before the loop but I'd prefer not to do that.

EDIT after posting I just realized that my implementation might not be correct at all. When the StreamReader closes and disposes I believe my previous position in the file will be lost. In which case, should I still use StreamReader without the using construct or is there a different type of file reader I should consider?

4

2 回答 2

2

你可以使用这样的东西:

而不是这个:

using (StreamReader fileReader = File.OpenText(logFile.ToString()))
{
    string fileLine;
    // read in a line from the file
    while ((fileLine = fileReader.ReadLine()) != null && numLinesRead < 50)
    {

做这个:

int numLinesRead = 0;

foreach (var fileLine in File.ReadLines(logFile.ToString()).Skip(1))
{
    if (++numLinesRead >= 50)
        break;
于 2013-09-10T20:56:59.170 回答
1

将另一个参数添加到文件中位置的方法。第一次为零,您可以在进入循环之前消耗该行。之后,您可以使用它将流定位到最后一个停止的位置。

例如

long position = 0;
while position >= 0
{
  position = ReadFiftyLines(argLogFile,0);
}
public long ReadFiftyLines(string argLogFile, long argPosition)
{
   using(FileStream fs = new FileStream(argLogFile,FileMode.Open,FileAccess.Read))
   {
       string line = null;
       if (argPosition == 0)
       {
          line = reader.Readline();
          if (line == null)
          {
             return -1; // empty file
          }
       }
       else
       { 
          fs.Seek(argPosition,SeekOrigin.Begin);
       }
       StreamReader reader = new StreamReader(fs);
       int count = 0;
       while ((line = reader.ReadLine() != null) && (count < 50))
       {
          count++;
          // do stuff with line
       }
       if (line == null)
       {
          return -1; // end of file
       }
       return fs.Position;
   }
}

或类似的。

于 2013-09-10T20:57:08.333 回答