ruby - 具有 X 个单元格的表格行的 CSS 选择器

Question

我正在尝试从网站上抓取一些内容，但我无法选择正确的元素。

我正在使用 Nokogiri，并且，因为我最了解 CSS，我正在尝试使用它来选择我想要的数据。

有一张大桌子，里面有我不想要的行，但这些可以改变；例如，它们并不总是第 4、5、6、10、14 行。

我可以判断它是否是我想要的行的唯一方法是该行中是否有TD标签。执行此操作的正确 CSS 选择器是什么？

 # Search for nodes by css
  doc.css('#mainContent p table tr').each do |td|
  throw td
  end

编辑：

我正在尝试抓取boxrec.com/schedule.php。我想要每个匹配的行，但是，它是一个非常大的表，其中包含许多不匹配的行。不需要每个日期部分的前几行，包括具有“bout subject to change...”的每一行，以及天之间的间隔行。

解决方案：

doc.xpath("//table[@align='center'][not(@id) and not(@class)]/tr").each do |trow|

    #Try get the date
    if trow.css('.show_left b').length == 1
      match_date = trow.css('.show_left b').first.content

    end

    if trow.css('td a').length == 2 and trow.css('* > td').length > 10

      first_boxer_td = trow.css('td:nth-child(5)').first
      second_boxer_td = trow.css('td:nth-child(5)').first

      match = {
        :round => trow.css('td:nth-child(3)').first.content.to_i,
        :weight => trow.css('td:nth-child(4)').first.content.to_s,
        :first_boxer_name => first_boxer_td.css('a').first.content.to_s,
        :first_boxer_link => first_boxer_td.css('a').first.attribute('href').to_s,
        :second_boxer_name => second_boxer_td.css('a').first.content.to_s,
        :second_boxer_link => second_boxer_td.css('a').first.attribute('href').to_s,
        :date => Time.parse(match_date)
      }  


      #:Weight => trow.css('td:nth-child(4)').to_s
      #:BoxerA => trow.css('td:nth-child(5)').to_s
      #:BoxerB => trow.css('td:nth-child(9)').to_s    

      myscrape.push(match)

    end
  end

score 1 · Accepted Answer

您将无法判断a 包含多少 td个元素tr，但您可以判断它是否为空：

doc.css('#mainContent p table tr:not(:empty)').each do |td|
  throw td
end

score 0 · Accepted Answer

你可以这样做：

tr 行，第 4 个 td

doc.xpath('//tr/td[4]/..')

使用 css 的另一种方式：

doc.css('tr').select{|tr| tr.css('td').length >= 4}

ruby - 具有 X 个单元格的表格行的 CSS 选择器

2 回答 2

Related

Reference