0

好的,我需要的有点奇怪,如下所示:

线索数组是手动创建的,数据数组是动态创建的。Xpath 函数将我们的线索作为输入并将结果映射到数据以创建动态数组

clues = Array.new

clues << 'Power supply type'    
clues << 'Slots'
clues << 'Software included'

selector = "//td[text()='%s']/following-sibling::td"

data = Array.new
data = clues.map do |clue| 
         xpath = selector % clue
         [clue, doc.at(xpath).text.strip]
       end

数据数组中的代码使用两个输入,线索和选择器
,线索[index] 处的每个项目进入 %s 处的选择器成为

//td[text()='%s']/following-sibling::td  
//td[text()='Power supply type']/following-sibling::td
//td[text()='Slots']/following-sibling::td
//td[text()='Software included']/following-sibling::td

然后 Xpath 使用我们存储的命令从网页中获取信息,然后将所有这些作为元素存储在数组 data 中,如 data[0]...data[3]

Data[2] 看起来像这样,一大块信息

Symantec Norton Internet Security (60 days live update); Recovery partition (inc
luding possibility to recover system; applications and drivers separately); Opti
onal re-allocation of recovery partition;

我想获取此处列出的每个软件并将其单独存储,例如

data[2]Symantec Norton Internet Security (60 days live update); 
data[3]Recovery partition (including possibility to recover system; 
data[4]Optional re-allocation of recovery partition;

所以我假设我需要以某种方式拆分 data[2] 并将其添加回数据数组?

我正在尝试隔离这个特定的索引,因为我需要它在多行上以便最终输出到电子表格

最终期望的输出

在此处输入图像描述

4

3 回答 3

2

Just to clarify, you have an array like this:

data << 'Power supply type'
data << 'Slots'
data << 'Symantec Norton Internet Security (60 days live update); Recovery partition (inc luding possibility to recover system; applications and drivers separately); Optional re-allocation of recovery partition;'
data << 'Something else'

And you want it become this?

data << 'Power supply type'
data << 'Slots'
data << Symantec Norton Internet Security (60 days live update);
data << Recovery partition (inc luding possibility to recover system;
data << applications and drivers separately);
data << Optional re-allocation of recovery partition;
data << 'Something else'

You can do this by doing the following:

temp = []
data[2].split(/(;)/).each_slice(2){ |s| temp << s.join.strip }
data[2] = temp
data.flatten!

Or if you want to iterate over all items in the data array:

data.each_with_index do |x, i|
  temp = []
  data[i].split(/(;)/).each_slice(2){ |s| temp << s.join.strip }
  data[i] = temp
end
data.flatten!

Basically what is happening is that it takes the string, splits it up on the ';', re-inserts the ';' where it was removed, replaces the original spot in the data array with the array of the split string, then flattens the entire data array back into one array.

于 2012-08-04T12:53:07.503 回答
0
data = data[0..1] + data[2].scan(/.*?;/) + data[3..-1]
于 2012-08-05T01:21:34.953 回答
0
data = Array.new
clues.each do |clue|
  xpath = selector % clue
  text = doc.at(xpath).text.strip
  if clue == 'Software included'
    values = text.scan(/.+?;/)
    values << text if values.empty? # text did not contain a semicolon
    data << [clue, values.shift.strip]
    values.each do |value|
      data << ['', value.strip]
    end
  else
    data << [clue, text]
  end
end

输出(缩进更易读):

[
  ["Power supply type", "400w"],
  ["Slots", "2"],
  ["Software included", "Symantec Norton Internet Security (60 days live update);"],
  ["", "Recovery partition (including possibility to recover system;"],
  ["", "applications and drivers separately);"],
  ["", "Optional re-allocation of recovery partition;"]
]
于 2012-08-04T15:44:44.670 回答