由于您想用冒号分隔值,因此在拆分之前对所有这些字符使用正则表达式中该字符的补码。
my $regex
= qr{
( # v- no worry, this matches the first non-space, non-colon
[^\s:]
(?> [^:\n]* # this matches all non-colon chars on the line
[^\s:] # match the last non-space, non-colon, if there
)? # but possibly not there
) # end group
\s* # match any number of whitespace
: # match the colon
\s* # followed by any number of whitespace
( \S # Start second capture with any non space
(?> .* # anything on the same line
\S # ending in a non-space
)? # But, possibly not there at all
| # OR
) # nothing - this gives the second capture as an
# empty string instead of an undef
}x;
while ( <$in> ) {
$hash{ $1 } = $2 if m/$regex/;
}
%hash
然后看起来像这样:
{ '* field' => '100'
, '.Cont.asasd' => ''
, 'H.25 miss here' => 'No'
, Othreaol => 'Value, Other value'
, 'Point->IP' => '0.0.0.0 Port 5060'
, 'Z.15 example' => 'No'
, blahbla => '<Set>'
, scree => '<what>'
}
当然,当我开始考虑它时,如果您可以确定一个/\s+:\s+/
模式或至少一个/\s{2,}:\s{2,}/
模式,那么像这样的行可能会更简单split
:
while ( <$in> ) {
if ( my ( $k, @v )
= grep {; length } split /\A\s+|\s+\z|(\s+:\s+)/
) {
shift @v; # the first one will be the separator
$hash{ $k } = join( '', @v );
}
}
它做同样的事情,不需要做几乎同样多的回溯来修剪结果。它忽略了转义的冒号,没有更多的语法,因为它必须是一个被空格包围的裸冒号。您只需将以下内容添加到 if 块:
$k =~ s/(?<!\\)(\\\\)*\\:/$1:/g;