ruby - 使用 Ruby 在网页中搜索 HREF 值

Question

我正在开发 3rd 方应用程序，我只能查看网页源href内容。从那里我必须只收集一些具有类似/aems/file/filegetrevision.do?fileEntityId. 是否可以？

HTML *（HTML 的一部分） *

<td width="50%">
<a href="/aems/file/filegetrevision.do?fileEntityId=10597525&cs=9b7sjueBiWLBEMj2ZU4I6fyQoPv-g0NLY9ETqP0gWk4.xyz">
screenshot.doc
</a>
</td>

score 2 · Accepted Answer

容易地：

require 'nokogiri'

html = '
<td width="50%">
<a href="/aems/file/filegetrevision.do?fileEntityId=10597525&cs=9b7sjueBiWLBEMj2ZU4I6fyQoPv-g0NLY9ETqP0gWk4.xyz">
screenshot.doc
</a>
</td>
'

doc = Nokogiri::HTML(html)
doc.search('a[href]').map{ |a| a['href'] }

返回：

[
    [0] "/aems/file/filegetrevision.do?fileEntityId=10597525&cs=9b7sjueBiWLBEMj2ZU4I6fyQoPv-g0NLY9ETqP0gWk4.xyz"
]

如果要过滤路径匹配，请使用以下内容：

pattern = Regexp.escape('/aems/file/filegetrevision.do?fileEntityId')
doc.search('a[href]').map{ |a| a['href'] }.select{ |href| href[ %r[^#{ pattern }] ] }

再次返回：

[
  [0] "/aems/file/filegetrevision.do?fileEntityId=10597525&cs=9b7sjueBiWLBEMj2ZU4I6fyQoPv-g0NLY9ETqP0gWk4.xyz"
]

此代码将返回文档中href所有<a>标签的参数。href在第二个示例中，它将按路径过滤它们。

score 1 · Accepted Answer

require 'open-uri'
source='http://www.example.com'
page = open(source).read
URI.extract(page,/.*\/aems\/file\/filegetrevision.do?fileEntityId=.*/)

ruby - 使用 Ruby 在网页中搜索 HREF 值

2 回答 2

Related

Reference