0

我知道这又是一个非常菜鸟的问题,但我现在在互联网上磕磕绊绊了几天,无法解决我的问题。我已经从 discogs 下载了数据转储,这是一个大约 35 GB 的 xml 文件。到目前为止,我将不得不使用 SAX-Parser,因为我显然无法将此文件加载到我的 RAM 中,并且该 ox 在 ruby​​ 中获得了最佳运行时,但我根本不明白如何使用它解析器,即使使用小型 IO-Objects 或仅用于测试的东西,它仍然是一件神奇的事情,把我不明白的东西扔给我。这是 xml 的样子:

<releases>
<release id="1" status="Accepted"><images><image height="600" type="primary" uri="" uri150="" width="600"/><image height="600" type="secondary" uri="" uri150="" width="600"/><image height="600" type="secondary" uri="" uri150="" width="600"/><image height="600" type="secondary" uri="" uri150="" width="600"/></images><artists><artist><id>1</id><name>The Persuader</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists><title>Stockholm</title><labels><label catno="SK032" id="5" name="Svek"/></labels><extraartists><artist><id>239</id><name>Jesper Dahlbäck</name><anv></anv><join></join><role>Music By [All Tracks By]</role><tracks></tracks></artist></extraartists><formats><format name="Vinyl" qty="2" text=""><descriptions><description>12"</description><description>33 ⅓ RPM</description></descriptions></format></formats><genres><genre>Electronic</genre></genres><styles><style>Deep House</style></styles><country>Sweden</country><released>1999-03-00</released><notes>The song titles are the names of six of Stockholm's 82 districts.

Title on label: - Stockholm -

Recorded at the Globe Studio, Stockholm

FAX: +46 8 679 64 53

</notes><data_quality>Needs Vote</data_quality><tracklist><track><position>A</position><title>Östermalm</title><duration>4:45</duration></track><track><position>B1</position><title>Vasastaden</title><duration>6:11</duration></track><track><position>B2</position><title>Kungsholmen</title><duration>2:49</duration></track><track><position>C1</position><title>Södermalm</title><duration>5:38</duration></track><track><position>C2</position><title>Norrmalm</title><duration>4:52</duration></track><track><position>D</position><title>Gamla Stan</title><duration>5:16</duration></track></tracklist><identifiers><identifier description="A-Side Runout" type="Matrix / Runout" value="MPO SK 032 A1"/><identifier description="B-Side Runout" type="Matrix / Runout" value="MPO SK 032 B1"/><identifier description="C-Side Runout" type="Matrix / Runout" value="MPO SK 032 C1"/><identifier description="D-Side Runout" type="Matrix / Runout" value="MPO SK 032 D1"/><identifier description="Only On A-Side Runout" type="Matrix / Runout" value="G PHRUPMASTERGENERAL T27 LONDON"/></identifiers><videos><video duration="326" embed="true" src="https://www.youtube.com/watch?v=afMHNll9EVM"><title>The Persuader - Gamla Stan</title><description>The Persuader - Gamla Stan</description></video><video duration="301" embed="true" src="https://www.youtube.com/watch?v=EBBHR3EMN50"><title>The Persuader - Norrmalm</title><description>The Persuader - Norrmalm</description></video><video duration="341" embed="true" src="https://www.youtube.com/watch?v=WDZqiENap_U"><title>The Persuader - Södermalm</title><description>The Persuader - Södermalm</description></video><video duration="176" embed="true" src="https://www.youtube.com/watch?v=XExCZfMCXdo"><title>The Persuader - Kungsholmen</title><description>The Persuader - Kungsholmen</description></video><video duration="376" embed="true" src="https://www.youtube.com/watch?v=Cawyll0pOI4"><title>The Persuader - Vasastaden</title><description>The Persuader - Vasastaden</description></video><video duration="296" embed="true" src="https://www.youtube.com/watch?v=MpmbntGDyNE"><title>The Persuader - Östermalm</title><description>The Persuader - Östermalm</description></video></videos><companies><company><id>271046</id><name>The Globe Studios</name><catno></catno><entity_type>23</entity_type><entity_type_name>Recorded At</entity_type_name><resource_url>https://api.discogs.com/labels/271046</resource_url></company><company><id>56025</id><name>MPO</name><catno></catno><entity_type>17</entity_type><entity_type_name>Pressed By</entity_type_name><resource_url>https://api.discogs.com/labels/56025</resource_url></company></companies></release>
<release id="2" status="Accepted"><images><image height="394" type="primary" uri="" uri150="" width="400"/><image height="600" type="secondary" uri="" uri150="" width="600"/><image height="600" type="secondary" uri="" uri150="" width="600"/></images><artists><artist><id>2</id><name>Mr. James Barth &amp; A.D.</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists><title>Knockin' Boots Vol 2 Of 2</title><labels><label catno="SK 026" id="5" name="Svek"/><label catno="SK026" id="5" name="Svek"/></labels><extraartists><artist><id>26</id><name>Alexi Delano</name><anv></anv><join></join><role>Producer, Recorded By</role><tracks></tracks></artist><artist><id>27</id><name>Cari Lekebusch</name><anv></anv><join></join><role>Producer, Recorded By</role><tracks></tracks></artist><artist><id>26</id><name>Alexi Delano</name><anv>A. Delano</anv><join></join><role>Written-By</role><tracks></tracks></artist><artist><id>27</id><name>Cari Lekebusch</name><anv>C. Lekebusch</anv><join></join><role>Written-By</role><tracks></tracks></artist></extraartists><formats><format name="Vinyl" qty="1" text=""><descriptions><description>12"</description><description>33 ⅓ RPM</description></descriptions></format></formats><genres><genre>Electronic</genre></genres><styles><style>Broken Beat</style><style>Techno</style><style>Tech House</style></styles><country>Sweden</country><released>1998-06-00</released><notes>All joints recorded in NYC (Dec.97).</notes><data_quality>Correct</data_quality><master_id is_main_release="true">713738</master_id><tracklist><track><position>A1</position><title>A Sea Apart</title><duration>5:08</duration></track><track><position>A2</position><title>Dutchmaster</title><duration>4:21</duration></track><track><position>B1</position><title>Inner City Lullaby</title><duration>4:22</duration></track><track><position>B2</position><title>Yeah Kid!</title><duration>4:46</duration></track></tracklist><identifiers><identifier description="Side A Runout Etching" type="Matrix / Runout" value="MPO SK026-A -J.T.S.-"/><identifier description="Side B Runout Etching" type="Matrix / Runout" value="MPO SK026-B -J.T.S.-"/></identifiers><videos><video duration="268" embed="true" src="https://www.youtube.com/watch?v=LgLchSRehhc"><title>Mr. James Barth &amp; A.D. - Dutchmaster</title><description>Mr. James Barth &amp; A.D. - Dutchmaster</description></video><video duration="297" embed="true" src="https://www.youtube.com/watch?v=x_Os7b-iWKs"><title>Mr. James Barth &amp; A.D. - Yeah Kid!</title><description>Mr. James Barth &amp; A.D. - Yeah Kid!</description></video><video duration="314" embed="true" src="https://www.youtube.com/watch?v=MIgQNVhYILA"><title>Mr. James Barth &amp; A.D. - A Sea Apart</title><description>Mr. James Barth &amp; A.D. - A Sea Apart</description></video><video duration="267" embed="true" src="https://www.youtube.com/watch?v=iaqHaULlqqg"><title>Mr. James Barth &amp; A.D. - Inner City Lullaby</title><description>Mr. James Barth &amp; A.D. - Inner City Lullaby</description></video></videos><companies><company><id>266169</id><name>JTS Studios</name><catno></catno><entity_type>29</entity_type><entity_type_name>Mastered At</entity_type_name><resource_url>https://api.discogs.com/labels/266169</resource_url></company><company><id>56025</id><name>MPO</name><catno></catno><entity_type>17</entity_type><entity_type_name>Pressed By</entity_type_name><resource_url>https://api.discogs.com/labels/56025</resource_url></company></companies></release>
<release id="3" status="Accepted"><images><image height="595" type="primary" uri="" uri150="" width="600"/><image height="472" type="secondary" uri="" uri150="" width="600"/><image height="600" type="secondary" uri="" uri150="" width="599"/><image height="470" type="secondary" uri="" uri150="" width="600"/></images><artists><artist><id>3</id><name>Josh Wink</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists><title>Profound Sounds Vol. 1</title><labels><label catno="CK 63628" id="6" name="Ruffhouse Records"/></labels><extraartists><artist><id>3</id><name>Josh Wink</name><anv></anv><join></join><role>DJ Mix</role><tracks></tracks></artist></extraartists><formats><format name="CD" qty="1" text=""><descriptions><description>Compilation</description><description>Mixed</description></descriptions></format></formats><genres><genre>Electronic</genre></genres><styles><style>Techno</style><style>Tech House</style></styles><country>US</country><released>1999-07-13</released><notes>1: Track title is given as "D2" (which is the side of record on the vinyl version of i220-010 release). This was also released on CD where this track is listed on 8th position. On both version no titles are given (only writing/producing credits). Both versions of i220-010 can be seen on the master release page [m27265]. Additionally this track contains female vocals that aren't present on original i220-010 release. &#13;
4: Credited as J. Dahlbäck. &#13;
5: Track title wrongly given as "Vol. 1". &#13;
6: Credited as Gez Varley presents Tony Montana. &#13;
12: Track exclusive to Profound Sounds Vol. 1.</notes><data_quality>Correct</data_quality><master_id is_main_release="false">66526</master_id><tracklist><track><position>1</position><title>Untitled 8</title><duration>7:00</duration><artists><artist><id>5</id><name>Heiko Laux</name><anv></anv><join>&amp;</join><role></role><tracks></tracks></artist><artist><id>4</id><name>Johannes Heil</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>2</position><title>Anjua (Sneaky 3)</title><duration>5:28</duration><artists><artist><id>15525</id><name>Karl Axel Bissler</name><anv>K.A.B.</anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>3</position><title>When The Funk Hits The Fan (Mood II Swing When The Dub Hits The Fan)</title><duration>5:25</duration><artists><artist><id>7</id><name>Sylk 130</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists><extraartists><artist><id>8</id><name>Mood II Swing</name><anv></anv><join></join><role>Remix</role><tracks></tracks></artist></extraartists></track><track><position>4</position><title>What's The Time, Mr. Templar</title><duration>4:27</duration><artists><artist><id>1</id><name>The Persuader</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>5</position><title>Vol. 2</title><duration>5:36</duration><artists><artist><id>267132</id><name>Care Company (2)</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>6</position><title>Political Prisoner</title><duration>3:37</duration><artists><artist><id>6981</id><name>Gez Varley</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>7</position><title>Pop Kulture</title><duration>5:03</duration><artists><artist><id>11</id><name>DJ Dozia</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>8</position><title>K-Mart Shopping (Hi-Fi Mix)</title><duration>5:42</duration><artists><artist><id>10702</id><name>Nerio's Dubwork</name><anv></anv><join>Meets</join><role></role><tracks></tracks></artist><artist><id>233190</id><name>Kathy Lee</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists><extraartists><artist><id>23</id><name>Alex Hi-Fi</name><anv></anv><join></join><role>Remix</role><tracks></tracks></artist></extraartists></track><track><position>9</position><title>Lovelee Dae (Eight Miles High Mix)</title><duration>5:47</duration><artists><artist><id>13</id><name>Blaze</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists><extraartists><artist><id>14</id><name>Eight Miles High</name><anv></anv><join></join><role>Remix</role><tracks></tracks></artist></extraartists></track><track><position>10</position><title>Sweat</title><duration>6:06</duration><artists><artist><id>67226</id><name>Stacey Pullen</name><anv></anv><join>Presents</join><role></role><tracks></tracks></artist><artist><id>7554</id><name>Black Odyssey</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists><extraartists><artist><id>67226</id><name>Stacey Pullen</name><anv></anv><join></join><role>Presenter</role><tracks></tracks></artist></extraartists></track><track><position>11</position><title>Silver</title><duration>3:16</duration><artists><artist><id>3906</id><name>Christian Smith &amp; John Selway</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>12</position><title>Untitled</title><duration>2:46</duration><artists><artist><id>3</id><name>Josh Wink</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>13</position><title>Boom Box</title><duration>3:41</duration><artists><artist><id>19</id><name>Sound Associates</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>14</position><title>Track 2</title><duration>3:39</duration><artists><artist><id>20</id><name>Percy X</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track></tracklist><identifiers><identifier type="Barcode" value="074646362822"/></identifiers>

只是将其作为片段插入,是最简单的方法,抱歉。我现在想要做的是寻找特殊的发行ID,检查他们是否有条形码,如果有的话,把那个拿回来。谁能指出我正确的方向?提前问候和感谢,rtuz2th

4

1 回答 1

1

SAX 是“事件”XML 解析。Ahandler具有需要的方法:

  • 进入一个元素(开始元素出现,即<child>
  • 退出一个元素(关闭元素发生,即</child>
  • 找到属性
  • 找到元素文本/正文

处理程序需要跟踪它当前在 XML 中的位置以及它感兴趣的值。因此它可以决定在遇到感兴趣的元素时要做什么。

您的示例 XML 有点大,所以我制作了自己的小示例:

xml = <<EOS
<root>
  <child id="1">
    <barcode value="1111">
  </child>
  <child id="2">
  </child>
  <child id="1">
    <barcode value="2222">
  </child>
  <child id="4">
    <barcode value="3333">
  </child>
</root>
EOS

我试图找到child具有oddID 和even条形码值的元素。对于这个简单的示例,我正在跟踪堆栈上的所有标签和属性,并在退出元素 ( @stack.pop) 时丢弃状态。根据您的 XML 文档的深度和标签/属性的数量,这可能是“昂贵的”。

require "ox"
require "stringio"

class Handler < ::Ox::Sax
  def initialize
    @stack = []
  end

  def start_element(element_name)
    @stack << [element_name, {}]
  end

  def end_element(element_name)
    parent_name, parent_attributes = @stack[-2]
    if parent_name == :child && parent_attributes[:id].to_i.odd?
      name, attributes = @stack[-1]
      if name == :barcode && attributes[:value].to_i.even?
        puts "Here is one record that seems interesting: Child: #{parent_attributes[:id]}, Barcode: #{attributes[:value]}"
      end
    end
    @stack.pop
  end

  def attr(attribute_name, attribute_value)
    _name, attributes = @stack.last
    attributes[attribute_name] = attribute_value
  end

end

handler = Handler.new
Ox.sax_parse(handler, StringIO.new(xml))

这将打印

这是一条看起来很有趣的记录:Child: 1, Barcode: 2222

于 2018-03-18T23:04:08.507 回答