xml - 我应该如何在 Perl 中解析大型 XML 文件？

Question

像下面的代码那样读取 XML 数据是否会在内存中创建 DOM 树？

my $xml = new XML::Simple;

my $data = $xml->XMLin($blast_output,ForceArray => 1);

对于大型 XML 文件，我应该使用SAX解析器、处理程序等吗？

score 14 · Accepted Answer

对于大型 XML 文件，您可以使用 XML::LibXML，如果文档适合内存，则在 DOM 模式下，或者使用拉模式（参见XML::LibXML::Reader）或XML::Twig（我写的，所以我有偏见，但它通常适用于太大而无法放入内存的文件）。

我不是 SAX 的粉丝，它很难使用，而且速度很慢。

score 4 · Accepted Answer

I would say yes to both. The XML::Simple library will create the entire tree in memory and it's a large multiple on the size of the file. For many applications if your XML is over 100MB or so, it'll be practical impossible to entirely load into memory in perl. A SAX parser is a way of getting "events" or notifications as the file is read and tags are opened or closed.

Depending on your usage patterns, either a SAX or a DOM based parser could be faster: for example, if you are trying to handle just a few nodes, or every node, in a large file, the SAX mode is probably best. For example, reading a large RSS feed and attempting to parse every item in it.

On the other hand, if you need to cross-reference one part of the file with another part, a DOM parser or accessing via XPath will make more sense - writing it in the "inside-out" manner that a SAX parser requires will be clumsy and tricky.

I recommend trying a SAX parser at least once, because the event-driven thinking required to do so is good exercise.

I've had good success with XML::SAX::Machines to set up SAX parsing in perl - if you want multiple filters and pipelines it's easy to set up. For simpler setups (i.e 99% of the time) you just need a single sax filter (look at XML::Filter::Base) and tell XML::SAX::Machines to just parse the file (or read from filehandle) using your filter. Here's a thorough article.

score 4 · Accepted Answer

我以前没有使用过 XML::Simple 模块，但从文档来看，它似乎在内存中创建了一个简单的散列。这不是一个完整的 DOM 树，但可能足以满足您的要求。

对于大型 XML 文件，使用 SAX 解析器会更快，并且内存占用更少，但是这又取决于您的需要。如果您只需要以串行方式处理数据，那么使用XML::SAX可能会满足您的需要。如果您需要操作整个树，那么使用XML::LibXML之类的东西可能会更适合您。

恐怕课程都是马

xml - 我应该如何在 Perl 中解析大型 XML 文件？

3 回答 3

Related

Reference