我在为 xml 标签解析文件时遇到问题,问题是文件可以有许多 xml 标签,或者它可能只有一个。我已经尝试通过正则表达式和使用 LibXML 来做到这一点。正则表达式的问题是,如果同一行上有两个结束标记,我的表达式甚至会打印第一个标记开始到第二个结束标记结束之间的数据
xml 文件 -
She outsprinted Becky Smith and Joan Hare to the line, with Becky and Joan
finishing in a time of <time>1:02:41</time> and <time> 1:02:45</time>
respectively.
我正在使用的正则表达式(想要提取时间详细信息) -
if (/<time>(.*)<\/time>/) {
($hh, $mm, $ss) = split(':', $1);
say "Time Entered - ", $hh, ":", $mm, ":", $ss, " ";
print "***$1***\n";
}
输出
Time Entered - 1:02:41</time> and <time> 1
预期的 -
1:02:41
1:02:45
** 第二种方法 - 使用 LibXML ** 我尝试使用下面的代码,但它给了我一个错误提示
"KnoxHalfResults:1: parser error : Start tag expected, '<' not found
Jim Colatis won Tuesday's Knoxville half marathon in a blistering pace"
输入文件有这个数据-
Jim Colatis won Tuesday's Knoxville half marathon in a blistering pace
of <time> 0:56:45 </time>. He was followed to the line by long time nemesis
Mickey Mouse in a time of <time>0:58:49</time>.
my code for LibXML -
use warnings;
#use XML::Twig;
use XML::LibXML;
my $filein;
my $fileout;
($filein, $fileout) = @ARGV;
my $parser = XML::LibXML->new();
my $xmldoc = $parser->parse_file($filein);
for my $sample ($xmldoc->findnodes('/time')) {
print $sample->nodeName(), ": ", $sample->textContent(), "\n";
}