4

I have a problem...

I need to store a daily barrage of about 3,000 mid-sized XML documents (100 to 200 data elements).

The data is somewhat unstable in the sense that the schema changes from time to time and the changes are not announced with enough advance notice, but need to be dealt with retroactively on an emergency "hotfix" basis.

The consumption pattern for the data involves both a website and some simple analytics (some averages and pie charts).

MongoDB seems like a great solution except for one problem; it requires converting between XML and JSON. I would prefer to store the XML documents as they arrive, untouched, and shift any intelligent processing to the consumer of the data. That way any bugs in the data-loading code will not cause permanent damage. Bugs in the consumer(s) are always harmless since you can fix and re-run without permanent data loss.

I don't really need "massively parallel" processing capabilities. It's about 4GB of data which fits comfortably in a 64-bit server.

I have eliminated from consideration Cassandra (due to complex setup) and Couch DB (due to lack of familiar features such as indexing, which I will need initially due to my RDBMS ways of thinking).

So finally here's my actual question...

Is it worthwhile to look for a native XML database, which are not as mature as MongoDB, or should I bite the bullet and convert all the XML to JSON as it arrives and just use MongoDB?

4

2 回答 2

4

您可以查看带有内置 XQuery 处理器和 Lucene 文本索引的 BaseX (Basex.org)。

于 2013-10-22T12:49:02.070 回答
2

数据量小

如果不需要并行数据处理,就不需要 Mongo DB。尤其是在处理 4GB 这样的小数据量时,分配工作的开销很容易超过实际的评估工作量。

4GB / 60k 节点也不是 XML 数据库的大容量。经过一段时间的学习后,您将意识到 XQuery 是一种用于 XML 文档分析的出色工具。

真的吗?

或者您是否每天获得 4GB 并且必须评估它以及您已经存储的所有数据?然后你得到一些你不能再在一台机器上存储和处理的数量;分配工作将变得必要。不是在几天或几周内,而是一年已经为您带来 1TB。

转换为 JSON

你输入的样子如何?它是否遵循任何模式甚至类似于表格数据?MongoDB 分析半结构化的能力比 XML 数据库提供的要差得多。另一方面,如果您只想在定义明确的路径上提取几个字段,并且可以一个接一个地分析一个输入文件,那么 Mongo DB 可能不会受到太大影响。

将 XML 带入云端

如果您想同时使用 XML 数据库的数据分析功能和 NoSQL 的某些系统功能来分配工作,您可以从该系统运行数据库。

BaseX 正在以您需要的功能进入云 - 但该功能可能仍需要一些时间才能投入生产。

于 2013-09-13T07:56:23.207 回答