erlang - 从 XML 中提取带有 XPath 的 URL

Question

我正在尝试提取描述标签下的第二个链接。我已经编写了以下代码，但它看起来真的很混乱 freads 和子字符串（只是为了让它工作）。有没有更清洁的方法来做到这一点？

XML 提取

魔术（网址）->

标签 = ".xml",

inets：开始（），

{ 好的，{状态，标题，正文}} = httpc:request(Url ++ Tag)，

{ Xml, 休息 } = xmerl_scan:string(Body),

{xmlObj , string , A } = xmerl_xpath:string("substring-after(substring-after(substring->before(//channel/item/description[ 1 ], '\">[link]') , 'br' ) , 'href=')", Xml),

{ok,_,B} = io_lib:fread("~6s" , A),

字符串：子字符串（B，1，字符串：len（B）-1）。

score 2 · Accepted Answer

不是一个完美的解决方案，但您可以使用这样的 xpath //channel/item/description[1]/text()[16]和//channel/item/description[1]/text()[24]

提取的字符串开头包含 urls + 引号，因此您可以使用列表匹配语法来切断引号：[_|Url] = ...

所以使用这个：[{_,_,_,_,[_|U1],_}] = xmerl_xpath:string("//channel/item/description[1]/text()[16]", Xml).将 U1 与第一个 url 绑定。

在外壳中测试：

11> [{_,_,_,_,[_|U1],_}] = xmerl_xpath:string("//channel/item/description[1]/text()[16]", Xml). 
[{xmlText,[{description,5},{item,5},{channel,1},{rss,1}],
          16,[],"\"http://www.reddit.com/user/escaped_reddit",text}]
12> 
12> U1.
"http://www.reddit.com/user/escaped_reddit"
13> 
13> 
13> [{_,_,_,_,[_|U2],_}] = xmerl_xpath:string("//channel/item/description[1]/text()[24]", Xml). 
[{xmlText,[{description,5},{item,5},{channel,1},{rss,1}],
          24,[],
          "\"http://www.reddit.com/r/erlang/comments/y62wf/how_to_use_ranch/",
          text}]
14> 
14> U2.
"http://www.reddit.com/r/erlang/comments/y62wf/how_to_use_ranch/"

erlang - 从 XML 中提取带有 XPath 的 URL

1 回答 1

Related

Reference