perl - 动态文件的 Perl 文件解析器

Question

我是 Perl 的新手，真的可以使用一些帮助来制作文件解析器。该文件是这样构建的（X 是一个在文件之间变化的数字，并提供包含列标题的以下行数）：

X,1,0,0,2,0,0,2,0,1,2,0,2,2,0,3,2,0,4,2,1,0,2,2,0,2,3,0,2,4,0,2,4,1,2,4,2,2,4,3,2,5,0,2,5,1,2,5,2,2,5,3,3,1,0,3
# Col_heading1
# Col_heading2
# Col_heading3 //Continues X rows
# Col_headingX 
# 2013 138 22:42:21 - Random text
# 2013 138 22:42:22 : Random text
# 2013 138 22:42:23 : Random text
2013 138 22:42:26, 10, 10, 10, 20, //continues X values
2013 138 22:42:27, 10, 10, 10, 20, 
2013 138 22:42:28, 10, 10, 10, 20, 
# 2013 138 22:42:31 - Random text
# 2013 138 22:42:32 : Random text
# 2013 138 22:42:33 - Event $eventname starting ($eventid) //$eventname and $eventid changes for each file
2013 138 22:42:35, 10, 10, 10, 20, 
2013 138 22:42:36, 10, 10, 10, 20, 
2013 138 22:42:37, 10, 10, 10, 20, 
2013 138 22:42:38, 10, 10, 10, 20, 
2013 138 22:42:39, 10, 10, 10, 20, 
# 2013 138 22:42:40 : Random text
2013 138 22:42:41, 10, 10, 10, 20, 
2013 138 22:42:42, 10, 10, 10, 20, 
# 2013 138 22:42:45 - Event $eventname ended ($eventid) //$eventname and $eventid changes for each file
2013 138 22:42:46, 10, 10, 10, 20, 
2013 138 22:42:47, 10, 10, 10, 20, 
# 2013 138 22:42:48 : Random text

解析器需要将 Col_headings 转置为一行中的制表符分隔值，并列出所有介于# 2013 138 22:42:33 - Event $eventname starting ($eventid)和之间# 2013 138 22:42:45 - Event $eventname ended ($eventid)且不以 # 开头的行。这些值也必须从逗号分隔更改为制表符分隔。

输出文件应如下所示：

Filename:/home/..../filename    What:$eventname Where:SYSTEM    ID:$eventid
Time                Col_heading1    Col_heading2    Col_heading3    Col_headingX
2013 138 22:42:35   10              10              10              20
2013 138 22:42:36   10              10              10              20
2013 138 22:42:37   10              10              10              20
2013 138 22:42:38   10              10              10              20
2013 138 22:42:39   10              10              10              20 
2013 138 22:42:41   10              10              10              20 
2013 138 22:42:42   10              10              10              20

对此的任何帮助将不胜感激！

score 1 · Accepted Answer

打开文件后，您可以从第一行获取数字：

my ($heading_count) = split /,/, <$fh>;

然后循环获取标题：

my @headings = qw(Time);
for (1..$heading_count) {
    chomp(my $heading = <$fh>); # Chomp to remove the newline
    # Process it somehow, e.g. remove leading # + whitespace
    $heading =~ s/^#\s+//;
    push @headings, $heading;
}

完成后，循环遍历文件的其余部分，解析和打印开始/结束模式之间的任何行。这是一个相当简单的示例，可以帮助您入门：

print join "\t", @headings, "\n"; # print out the headings
my $in_event = 0; # State variable to track if we're in an event
while(<DATA>) {
    if (/Event (.*) starting \((.*)\)/) { # Watch for the event starting, event name is now in $1, event id in $2
        $in_event = 1;
        next;
    }
    next unless $in_event; # Skip if not in an event yet
    last if /Event .* ended/; # Stop reading if the event ends
    next if /^#/; # Skip comments

    s/,\s?/\t/g; # Replace commas with tabs
    print; # Print the row
}

您会发现使用这种方法，由于长度可变，列标题无法与数据正确对齐，因此您需要对其进行调整以准确获取所需内容，或者查看Text::CSV解析行（或使用split）以及Text::Table制作一张合适的桌子之类的东西。

perl - 动态文件的 Perl 文件解析器

1 回答 1

Related

Reference