0

Apriori算法一般以矩阵形式接收输入,如下:

TID A B C D E
T1 1 1 1 0 0
T2 1 1 1 1 1
T3 1 0 1 1 0
T4 1 0 1 1 1
T5 1 1 1 1 0 

同时,我的输入是一般形式的 XML 数据:

 <article key="tr/gte/TR-0263-08-94-165">
<author>Frank Manola</author>
<title>An Evaluation of Object-Oriented DBMS Developments: 1994 Edition.</title>
<journal>GTE Laboratories Incorporated</journal>
<volume>TR-0263-08-94-165</volume>
<month>August</month>
<year>1994</year>
</article>

如何将此类数据转换为算法可接受的合适形式?任何建议。

谢谢

4

1 回答 1

0

Assuming you're using Python, it would be best to use the Element Tree XML parser (documentation included below). This allows you to parse XML data into a python dictionary that you can then translate however you need. Note, if your XML data files are extremely large, it can be handy to use the iterparse to avoid massive memory requirements.

https://docs.python.org/2/library/xml.etree.elementtree.html

于 2015-01-18T06:41:48.907 回答