ruby - 按属性获取 xml-xpath

Question

http://www.mdr.de/export/sandmann/folgen/sandmann612-mediaRss_doca-1_zc-1a3071ad.xml除其他外，还返回以下几行：

(...)
<media:content url="http://x4100mp4dynonlc22033.f.o.l.lb.core-cdn.net/22033mdr/ondemand/4100mp4dynonl/FCMS-066eb3e7-81b2-4dae-898d-4963137eb4b6-e9ebd6e42ce1.mp4" type="video/mpeg" expression="full" width="512" height="288" bitrate="512" duration="398" />
<media:content url="http://x4100mp4dynonlc22033.f.o.l.lb.core-cdn.net/22033mdr/ondemand/4100mp4dynonl/FCMS-066eb3e7-81b2-4dae-898d-4963137eb4b6-c7cca1d51b4b.mp4" type="video/mpeg" expression="full" width="960" height="544" bitrate="1536" duration="398" />
(...)

我如何告诉 Nokogiri 只提取行 where bitrate="1536"？

我实际上只需要那个 XPath 中的 URL，所以我期望（我觉得在这里写“期望”相当粗鲁，但有人告诉我这样做；）返回以下字符串：

http://x4100mp4dynonlc22033.f.o.l.lb.core-cdn.net/22033mdr/ondemand/4100mp4dynonl/FCMS-066eb3e7-81b2-4dae-898d-4963137eb4b6-c7cca1d51b4b.mp4

如果有人有兴趣，这将允许我下载 Sandmännchen 的每日剧集，这是一部面向小孩的德国电视迷你剧。:)

到目前为止，我已经尝试过使用simpleRSS这个：

(...)
rss.entries.each do |entry|
    pp entry
end

但这仅返回media:group“集合”链接的第一项：

{:title=>"Sandmann vom 14. Oktober 2012",
 :link=>"http://www.mdr.de/export/sandmann/folgen/video78338.html",
 :description=>
  "Die j\xC3\xBCngste Geschichte vom Sandmann gibt es f\xC3\xBCr 24 Stunden hier auf Abruf. Heute: Molly mag keine Schuhe. Das finden die anderen Monster merkw\xC3\xBCrdig, weil Monster Schuhe lieben.",
 :pubDate=>2012-09-19 14:54:43 +0200,
 :guid=>
  "mp4:4100mp4dynonl/FCMS-066eb3e7-81b2-4dae-898d-4963137eb4b6-8442e17c3177",
 :media_content_url=>
  "rtmp://x4100mp4dynonlc22033.f.o.f.lb.core-cdn.net/22033mdr/ondemand",
 :media_content_type=>"fms/h264",
 :media_content_height=>"272",
 :media_content_width=>"480",
 :media_title=>"Sandmann vom 14. Oktober 2012",
 :media_thumbnail_url=>
  "http://www.mdr.de/export/sandmann/folgen/sandmann864_v-standard43_zc-698fff06.jpg",
 :media_thumbnail_height=>"135",
 :media_thumbnail_width=>"180"}

score 1 · Accepted Answer

这个怎么样：

doc.at_xpath('//media:content[@bitrate="1536"]/@url').text
#=> "http://www.mdr.de/export/sandmann/folgen/sandmann612-mediaRss__zc-1a3071ad.xml"

顺便说一句，该链接不起作用，因此我实际上无法在完整文档上对此进行测试。

更新：

在 nokogiri 中使用以下答案中的信息：

filme = Nokogiri::XML(open('http://www.sandmann.de/static/san/app/filme.xml'))
folge = Nokogiri::XML(open(filme.xpath('//filme/folge').text))

folge.at_xpath('//media:content[@bitrate="1536"]/@url').text
#=> "http://x4100mp4dynonlc22033.f.o.l.lb.core-cdn.net/22033mdr/ondemand/4100mp4dynonl/FCMS-066eb3e7-81b2-4dae-898d-4963137eb4b6-c7cca1d51b4b.mp4"

score 0 · Accepted Answer

0

为方便起见，只需：

doc.at('content[@bitrate="1536"]')[:url]

于 2012-10-15T14:09:24.090 回答

score 0 · Accepted Answer

require 'nokogiri'
require 'open-uri'

url = 'http://www.mdr.de/export/sandmann/folgen/sandmann612-mediaRss_doca-1_zc-1a3071ad.xml'
doc = Nokogiri.XML(open(url))
doc.remove_namespaces! # Just to make our life simpler
content = doc.at_css('content[bitrate="1536"]')
puts content['url']
#=> http://x4100mp4dynonlc22033.f.o.l.lb.core-cdn.net/22033mdr/ondemand/4100mp4dynonl/FCMS-fd2af820-ec90-4f34-a58e-db1b9fdcc25a-c7cca1d51b4b.mp4

score 0 · Accepted Answer

这就是我最终想出的 - 不nokogiri（我认为它非常强大，但学习曲线相当陡峭。另外，我根本不明白......）crack而是。它似乎更红宝石，并且与我得到的 MRSS 提要很好地配合：

require 'rubygems'
require 'pp'
require 'crack'
require 'asciify'
require 'open-uri'

fileurl = ""
filme  = Crack::XML.parse(open('http://www.sandmann.de/static/san/app/filme.xml'))
folge = Crack::XML.parse(open(filme['filme']['folge']))
titel = folge['rss']['channel']['item']['description'].to_s.sub(/.*Die jüngste Geschichte vom Sandmann gibt es für 24 Stunden hier auf Abruf. Heute: /, '')
folge['rss']['channel']['item']['media:group']['media:content'].each do |x|
    fileurl << x['url'] if x['bitrate'] == "1536"
end
filename = titel.split(".").first.asciify + ".m4v"
filename.gsub!(" ","_")

system("curl -o \"#{filename}\" \"#{fileurl}\"")

以防你的孩子也想看；）

ruby - 按属性获取 xml-xpath

4 回答 4

Related

Reference