好吧,这取决于您要对数据做什么。
假设你对此有很大while (<>) { ... }
的了解,你可以通过使用 split 来获得最简单的解析:
my @fields = split;
下一个层次是增加一点意义
my ($date, $time, $id, $host, $from, $to, undef, $dest) = split;
undef
(注意,如果要忽略结果,可以分配给)
最后,您可以使用正则表达式来清理很多杂乱无章的东西。您还可以将上面的拆分与较小的正则表达式结合起来单独清理每个字段。
my ($datetime, $id, $host, $from, $to, $dest) =
/([\d-]+ [\d:]+) \s+ # date and time together
(\S+) \s+ # message id, just a block of non-whitespace
<(.*?)> \s+ # hostname in angle brackets, .*? is non-greedy slurp
\((.*?)\) \s+ # from email in parens
<(.*?)> \s+ # to email in angle brackets
\S+ \s+ # separated between to-email and dest
(\S+) # last bit, could be improved to (\w)=\[(.*?)\]
/x; # /x lets us break all of this up, so its a bit readable
当然,你可以继续把它带到各种愚蠢的地方,但如果你要开始对这些字段进行更具体的解析,我会先进行初始拆分,然后再进行细分字段解析。例如:
my ($date, $time, ...) = split;
my ($year, $month, $day) = split(/-/, $date);
my ($hour, $min, $sec) = split(/:/, $time);
my ($from_user, $from_host) = ( $from =~ /< ([^\@]+) \@ (.*) >/x );
...etc...