3

以下工作但总是很慢,似乎每页停止我的抓取程序及其 Firefox 或 Chrome 浏览器甚至整分钟:

pp recArray = $browser.table(:id,"recordTable").to_a

获取 HTML 表格的文本或 html 源代码很快:

htmlcode = $browser.table(:id,"recordTable").html  # .text shows only plaintext portion like lynx

例如,我如何能够使用仅包含该表的 html 的 Nokogiri 对象来创建相同的recArray(来自 a 的每个元素)?<TR>

recArray = Nokogiri::HTML(htmlcode). ??

4

3 回答 3

4

几天前我写了一篇博客文章:http: //zeljkofilipin.com/watir-nokogiri/

如果您还有其他问题,请询问。

于 2012-05-17T10:45:36.273 回答
2

您想要表中的每个 tr 吗?

Nokogiri::HTML($browser.html).css('table[@id="recordTable"] > tr')

这给出了一个比 Array 更有用的 NodeSet。当然还有to_a

于 2012-05-17T10:58:36.183 回答
1

Thought it would be useful to sum up all the steps here and there:

The question was how to produce the same array object filled with strings from the page's text content that a Watir::Webdriver Table #to_a might produce, but much faster:

 recArray = Nokogiri::HTML(htmlcode). **??**

So instead of this as I was doing before:

  recArray=$browser.table(:class, 'detail-table w-Positions').to_a

I send the whole page's html as a string to Nokogiri to let it do the parsing:

  recArray=Nokogiri::HTML($browser.html).css('table[@class="detail-table w-Positions"] tr').to_a 

Which found me the rows of the table I want and put them into an array.

Not done yet since the elements of that array are still Nokogiri (Table Row?) types, which barfed when I attempted things like .join(",") (useful for writing into a .CSV file or database for instance)

So the following iterates through each row element, turning each into an array of pure Ruby String types, containing only the text content of each table cell stripped of html tags:

 recArray= recArray.map {|row| row.css("td").map {|c| c.text}.to_a }  # Could of course be merged with above to even longer, nastier one-liner

Each cell had previously also been a Nokogiri Element type, done away with the .text mapping.

Significant speedup achieved.

Next I wonder what it would take to simply override the #to_a method of every Watir::Webdriver Table object globally in my Ruby code files....

(I realize that may not be 100% compatible but it would spare me so much code rewriting. Am willing to try in my personal.lib.rb include file.)

于 2012-05-17T12:55:10.273 回答