3

在为 Nokogiri 及其文档进行正确设置时遇到一些问题,入门有点粗糙。

我正在尝试解析 XML 文件:http ://www.kongregate.com/games_for_your_site.xml

它返回游戏集中的多个游戏,并且对于每个游戏,它都有一个标题、描述等......

<gameset>
  <game>
    <id>160342</id>
    <title>Tricky Rick</title>
    <thumbnail>
      http://cdn3.kongregate.com/game_icons/0042/7180/KONG_icon250x200_site.png?21656-op
    </thumbnail>
    <launch_date>2012-12-12</launch_date>
    <category>Puzzle</category>
    <flash_file>
      http://external.kongregate-games.com/gamez/0016/0342/live/embeddable_160342.swf
    </flash_file>
    <width>640</width>
    <height>480</height>
    <url>
      http://www.kongregate.com/games/tAMAS_Games/tricky-rick
    </url>
    <description>
      Help Rick to collect all the stolen fuel to refuel his spaceship and fly away from the planet. Use hammer, bombs, jetpack and other useful stuff to solve puzzles!
    </description>
    <instructions>
      WASD \ Arrow Keys &#8211; move; S \ Down Arrow &#8211; take\release an object; CNTRL &#8211; interaction with objects: throw, hammer strike, invisibility mode; SPACE &#8211; interaction with elevators and fuel stations; Esc \ P &#8211; pause;
    </instructions>
    <developer_name>tAMAS_Games</developer_name>
    <gameplays>24999</gameplays>
    <rating>3.43</rating>
  </game>
  <game>
    <id>160758</id>
    <title>Flying Cookie Quest</title>
    <thumbnail>
      http://cdn2.kongregate.com/game_icons/0042/8428/icon_cookiequest_kong_250x200_site.png?16578-op
    </thumbnail>
    <launch_date>2012-12-07</launch_date>
    <category>Action</category>
    <flash_file>
      http://external.kongregate-games.com/gamez/0016/0758/live/embeddable_160758.swf
    </flash_file>
    <width>640</width>
    <height>480</height>
    <url>
      http://www.kongregate.com/games/LongAnimals/flying-cookie-quest
    </url>
    <description>
      Launch Rocket Panda into the land of Cookies. With the help of low-flying sharks, hang-gliding sheep and Rocket Badger, can you defeat the all powerful Biscuit Head? Defeat All enemies of cookies in this launcher game.
    </description>
    <instructions>Use the mouse button!</instructions>
    <developer_name>LongAnimals</developer_name>
    <gameplays>168672</gameplays>
    <rating>3.67</rating>
  </game>

从文档中,我使用的是类似的东西:

require 'nokogiri'
require 'open-uri'

url = "http://www.kongregate.com/games_for_your_site.xml"
xml = Nokogiri::XML(open(url))
xml.xpath("//game").each do |node|
    puts node.xpath("//id")
    puts node.xpath("//title")
    puts node.xpath("//thumbnail")
    puts node.xpath("//category")
    puts node.xpath("//flash_file")
    puts node.xpath("//width")
    puts node.xpath("//height")
    puts node.xpath("//description")
    puts node.xpath("//instructions")
end

但是,它只是返回无穷无尽的数据,而不是一组数据。任何帮助都会有所帮助。

4

2 回答 2

20

这是我重写代码的方法:

xml = Nokogiri::XML(open("http://www.kongregate.com/games_for_your_site.xml"))
xml.xpath("//game").each do |game|
  %w[id title thumbnail category flash_file width height description instructions].each do |n|
    puts game.at(n)
  end
end

您的代码中的问题是所有子标记都带有前缀//,在 XPath 中,这意味着“从根节点开始并向下搜索包含该文本的所有标记”。因此,它不是只在每个//game节点内部搜索,而是在整个文档中搜索每个//game节点列出的每个标签。

我建议在 XPath 上使用 CSS 访问器,因为它们更简单(通常)并且因此更易于阅读。所以,而不是xpath('//game')我使用search('game'). (search将采用 CSS 或 XPath 访问器,也将采用at。)

如果您想要标签中包含的文本,请更改puts game.at(n)为:

puts game.at(n).text

为了使输出更有用,我会这样做:

require 'nokogiri'
require 'open-uri'

xml = Nokogiri::XML(open('http://www.kongregate.com/games_for_your_site.xml'))
games = xml.search('game').map do |game|
  %w[
    id title thumbnail category flash_file width height description instructions
  ].each_with_object({}) do |n, o|
    o[n] = game.at(n).text
  end
end

require 'awesome_print'
puts games.size
ap games.first
ap games.last

结果是:

395
{
              "id" => "160342",
          "title"  => "Tricky Rick",
      "thumbnail"  => "http://cdn3.kongregate.com/game_icons/0042/7180/KONG_icon250x200_site.png?21656-op",
        "category" => "Puzzle",
      "flash_file" => "http://external.kongregate-games.com/gamez/0016/0342/live/embeddable_160342.swf",
          "width"  => "640",
          "height" => "480",
    "description"  => "Help Rick to collect all the stolen fuel to refuel his spaceship and fly away from the planet. Use hammer, bombs, jetpack and other useful stuff to solve puzzles!\n",
    "instructions" => "WASD \\ Arrow Keys &#8211; move;\nS \\ Down Arrow &#8211; take\\release an object;\nCNTRL &#8211; interaction with objects: throw, hammer strike, invisibility mode;\nSPACE &#8211; interaction with elevators and fuel stations;\nEsc \\ P &#8211; pause;\n"
}
{
              "id" => "78",
          "title"  => "rotaZion",
      "thumbnail"  => "http://cdn2.kongregate.com/game_icons/0000/0115/pixtiz.rotazion_icon.jpg?8217-op",
        "category" => "Action",
      "flash_file" => "http://external.kongregate-games.com/gamez/0000/0078/live/embeddable_78.swf",
          "width"  => "350",
          "height" => "350",
    "description"  => "In rotaZion, you play with a bubble bar that you can&#8217;t stop rotating !\nCollect the bubbles and try to avoid the mines !\nCollect the different bonus to protect your bubble bar, makes the mines go slower or destroy all the mines !\nTry to beat 100.000 points ;)\n",
    "instructions" => "Move the bubble bar with the arrow keys !\nBubble = 500 Points !\nPixtiz sign = 5000 Points !\n"
}
于 2013-01-01T07:29:05.750 回答
1

你可以尝试这样的事情。我建议为您想要的游戏内部元素创建一个数组,然后对其进行迭代。我确信有一种方法可以在 Nokogiri 中获取指定元素中的所有元素,但这有效:

   xml = Nokogiri::XML(result)
    xml.css("game").each do |inv|
      inv.css("title").each do |f|  # title or whatever else you want
        puts f.inner_html
      end
    end
于 2013-01-01T02:50:34.587 回答