0

So, I have a table with multiple rows and columns.

<table>
  <tr>
    <th>Employee Name</th>
    <th>Reg Hours</th>
    <th>OT Hours</th>
  </tr>
  <tr>
    <td>Employee 1</td>
    <td>10</td>
    <td>20</td>
  </tr>
  <tr>
    <td>Employee 2</td>
    <td>5</td>
    <td>10</td>
  </tr>
</table>

There is also another table:

<table>
  <tr>
    <th>Employee Name</th>
    <th>Revenue</th>
  </tr>
    <td>Employee 2</td>
    <td>$10</td>
  </tr>
  <tr>
    <td>Employee 1</td>
    <td>$50</td>
  </tr>
</table>

Notice that the employee order may be random between the tables.

How can I use nokogiri to create a json file that has each employee as an object, with their total hours and revenue?

Currently, I'm able to just get the individual table cells with some xpath. For example:

puts page.xpath(".//*[@id='UC255_tblSummary']/tbody/tr[2]/td[1]/text()").inner_text

Edit:

Using the page-object gem and the link from @Dave_McNulla, I tried this piece of code just to see what I get:

class MyPage
  include PageObject

  table(:report, :id => 'UC255_tblSummary')

  def get_some_information
    report_element[1][2].text
  end
end

puts get_some_information

Nothing's being returned, however.

Data: https://gist.github.com/anonymous/d8cc0524160d7d03d37b

There's a duplicate of the hours table. The first one is fine. The other table needed is the accessory revenue table. (I'll also need the activations table, but I'll try to merge that from the code that merges the hours and accessory revenue tables.

4

1 回答 1

5

我认为一般的做法是:

  1. 为键为员工的每个表创建一个哈希
  2. 将两个表的结果合并在一起
  3. 转换为 JSON

为键为员工的每个表创建一个哈希

这部分你可以在 Watir 或 Nokogiri 中完成。仅当 Watir 由于大表而导致性能不佳时,才有意义使用 Nokogiri。

瓦提尔:

#I assume you would have a better way to identify the tables than by index
hours_table = browser.table(:index, 0)
wage_table = browser.table(:index, 1)

#Turn the tables into a hash
employee_hours = {}
hours_table.trs.drop(1).each do |tr| 
    tds = tr.tds
    employee_hours[ tds[0].text ] = {"Reg Hours" => tds[1].text, "OT Hours" => tds[2].text}     
end
#=> {"Employee 1"=>{"Reg Hours"=>"10", "OT Hours"=>"20"}, "Employee 2"=>{"Reg Hours"=>"5", "OT Hours"=>"10"}}

employee_wage = {}
wage_table.trs.drop(1).each do |tr| 
    tds = tr.tds
    employee_wage[ tds[0].text ] = {"Revenue" => tds[1].text}   
end
#=> {"Employee 2"=>{"Revenue"=>"$10"}, "Employee 1"=>{"Revenue"=>"$50"}}

野小切:

page = Nokogiri::HTML.parse(browser.html)

hours_table = page.search('table')[0]
wage_table = page.search('table')[1]

employee_hours = {}
hours_table.search('tr').drop(1).each do |tr| 
    tds = tr.search('td')
    employee_hours[ tds[0].text ] = {"Reg Hours" => tds[1].text, "OT Hours" => tds[2].text}     
end
#=> {"Employee 1"=>{"Reg Hours"=>"10", "OT Hours"=>"20"}, "Employee 2"=>{"Reg Hours"=>"5", "OT Hours"=>"10"}}

employee_wage = {}
wage_table.search('tr').drop(1).each do |tr| 
    tds = tr.search('td')
    employee_wage[ tds[0].text ] = {"Revenue" => tds[1].text}   
end
#=> {"Employee 2"=>{"Revenue"=>"$10"}, "Employee 1"=>{"Revenue"=>"$50"}}

将两个表的结果合并在一起

您希望将两个哈希合并在一起,以便对于特定员工,哈希将包括他们的工作时间以及他们的收入。

employee = employee_hours.merge(employee_wage){ |key, old, new| new.merge(old) }
#=> {"Employee 1"=>{"Revenue"=>"$50", "Reg Hours"=>"10", "OT Hours"=>"20"}, "Employee 2"=>{"Revenue"=>"$10", "Reg Hours"=>"5", "OT Hours"=>"10"}}

转换为 JSON

基于this previous question,您可以将哈希转换为json。

require 'json'
employee.to_json
于 2013-03-20T16:49:57.577 回答