ruby - 使用 Watir Webdriver 速度问题提取链接

Question

我在运行 Firefox 的 Linux 系统上使用无头的 Watir Webdriver，并且在从网页中提取链接时遇到了一些速度问题。问题似乎出在使用多个帧时。例如，返回 www.cnet.com 上的所有链接需要 10 分钟。

为什么要花这么长时间，我能做些什么来加快速度吗？

例如，这些是我采用的一些典型时间。从“默认框架”中获取所有链接需要大约 8 秒，但从框架中获取这些链接需要 20 秒：

No Frame: 8.304341236
Frame: 20.050233141
Frame: 20.070569295
....

事实上，在这种情况下，没有一个框架实际上包含任何链接。（请参阅我提出的关于跳过某些帧Watir-Webdriver Frame Attributes Not Congurent with Other Sources 的这个问题）

从页面中提取链接的代码如下：

b.links.each do |uri|
  # Check the HREF doesn't meet any of the following conditions. We don't want these so we ignore them.
  if uri.href != nil and uri.href != "" and uri.href[0,7].downcase != "mailto:" and uri.href[0,11].downcase != "javascript:"
    if debug
      puts " [x] [" + Process.pid.to_s + "] Discovered (noframe) URL: " + uri.href
    end
    # Add the discovered HREF to the array
    href.push(uri.href)
  end
end

用于从帧中提取链接的代码如下：

b.frames.each do |frame|
  frame.links.each do |uri|
    if uri.href != nil and uri.href != "" and uri.href[0,7].downcase != "mailto:" and uri.href[0,11].downcase != "javascript:"
      if debug
        puts " [x] [" + Process.pid.to_s + "] Discovered Frame URL: " + uri.href
      end
      # Add the discovered HREF to the array
      href.push(uri.href)
    end
  end
end

任何帮助，将不胜感激。

score 0 · Accepted Answer

我想我找到了问题的根源，但没有找到问题的实际根本原因。

在我的代码前面，我为超时设置了以下值：

b.driver.manage.timeouts.implicit_wait = 20

如果我将其设置为 3 秒，那么我的代码运行速度会明显加快。

也就是说，它为什么要等待超时值？

另一个站点的测试结果：

Timeout = 3
No Frame: 8.492559438
Frame: 3.037607356
Frame: 0.21291884
Frame: 0.187332136
Total: 27.3930574

Timeout = 20
No Frame: 8.698615854
Frame: 20.039797232
Frame: 0.202382168
Frame: 0.192850861
Total: 44.221886117

我想知道是否有错误。如果它找不到您正在寻找的元素，它似乎需要整个超时值才能返回。

请注意，我知道总数不会相加，因为我只是在测量某些代码行之间的时间。Total 是从开始到结束运行所需的时间，而其他时间在循环之间。

ruby - 使用 Watir Webdriver 速度问题提取链接

1 回答 1

Related

Reference