I have several log files that I need to parse and combine based on a timestamp. They're of the format:
GaRbAgE fIrSt LiNe
[1124 0905 134242422 ] Logs initialized
[1124 0905 134242568 SYSTEM] Good log entry:
{ Collection:
["Attribute"|String]
...
[1124 0905 135212932 SYSTEM] Good log entry:
As you can see I don't need the first line.
I'm currently using some Regex to parse each file: one expression determines if I have a "Logs initialized" line, which I don't care about and discard; another determines if I have a "Good log entry", which I keep and parse; and some of the good log entries span multiple lines. I simply accept the logs that are on multiple lines. However, the code currently also captures the first garbage line because it is indistinguishable from a multi-line log comment from a Regex viewpoint. Furthermore, from what I read Regex is not the solution here (Parsing a log file with regular expressions).
There are many log files and they can grow to be rather large. For this reason, I'm only reading 50 lines at a time per log before buffering and then combining them into a separate file. I loop through every file as long as there are non-null files left. Below is a code example where I replaced some conditions and variables with explanations.
while (there are non-null files left to read)
{
foreach (object logFile in logFiles) //logFiles is an array that stores the log names
{
int numLinesRead = 0;
using (StreamReader fileReader = File.OpenText(logFile.ToString()))
{
string fileLine;
// read in a line from the file
while ((fileLine = fileReader.ReadLine()) != null && numLinesRead < 50)
{
// compare line to regex expressions
Match rMatch = rExp.Match(fileLine);
if (rMatch.Success) // found good log entry
{
...
How would you skip that first garbage line? Unfortunately it is not as easy as simply consuming a line with ReadLine()
because the StreamReader is within a loop and I'll end up deleting a line every 50 others.
I thought of keeping a list or array of files for which I've skipped that first line already (in order to not skip it more than once) but that is sort of ugly. I also thought of getting rid of the using
statement and opening the StreamReader up before the loop but I'd prefer not to do that.
EDIT after posting I just realized that my implementation might not be correct at all. When the StreamReader closes and disposes I believe my previous position in the file will be lost. In which case, should I still use StreamReader without the using
construct or is there a different type of file reader I should consider?