4

像下面的代码那样读取 XML 数据是否会在内存中创建 DOM 树?

my $xml = new XML::Simple;

my $data = $xml->XMLin($blast_output,ForceArray => 1);

对于大型 XML 文件,我应该使用SAX解析器、处理程序等吗?

4

3 回答 3

14

对于大型 XML 文件,您可以使用 XML::LibXML,如果文档适合内存,则在 DOM 模式下,或者使用拉模式(参见XML::LibXML::Reader)或XML::Twig(我写的,所以我有偏见,但它通常适用于太大而无法放入内存的文件)。

我不是 SAX 的粉丝,它很难使用,而且速度很慢。

于 2009-12-03T10:58:11.410 回答
4

I would say yes to both. The XML::Simple library will create the entire tree in memory and it's a large multiple on the size of the file. For many applications if your XML is over 100MB or so, it'll be practical impossible to entirely load into memory in perl. A SAX parser is a way of getting "events" or notifications as the file is read and tags are opened or closed.

Depending on your usage patterns, either a SAX or a DOM based parser could be faster: for example, if you are trying to handle just a few nodes, or every node, in a large file, the SAX mode is probably best. For example, reading a large RSS feed and attempting to parse every item in it.

On the other hand, if you need to cross-reference one part of the file with another part, a DOM parser or accessing via XPath will make more sense - writing it in the "inside-out" manner that a SAX parser requires will be clumsy and tricky.

I recommend trying a SAX parser at least once, because the event-driven thinking required to do so is good exercise.

I've had good success with XML::SAX::Machines to set up SAX parsing in perl - if you want multiple filters and pipelines it's easy to set up. For simpler setups (i.e 99% of the time) you just need a single sax filter (look at XML::Filter::Base) and tell XML::SAX::Machines to just parse the file (or read from filehandle) using your filter. Here's a thorough article.

于 2010-01-17T02:26:45.150 回答
4

我以前没有使用过 XML::Simple 模块,但从文档来看,它似乎在内存中创建了一个简单的散列。这不是一个完整的 DOM 树,但可能足以满足您的要求。

对于大型 XML 文件,使用 SAX 解析器会更快,并且内存占用更少,但是这又取决于您的需要。如果您只需要以串行方式处理数据,那么使用XML::SAX可能会满足您的需要。如果您需要操作整个树,那么使用XML::LibXML之类的东西可能会更适合您。

恐怕课程都是马

于 2009-12-03T09:36:47.520 回答