0

我有一个名为这些对象的人的数组:

Nokogiri::XML::Text:0x3fe41985e69c "CEO, Company_1"
Nokogiri::XML::Text:0x3fe4194dab74 "COO, Company_2 "
Nokogiri::XML::Text:0x3fe4195eb414 "CFO, Company_3"

我想在“,”处拆分对象,所以我尝试做这样的事情:

companies = people.each do | company | 
  company.inner_text.match("/, (.*)/")
end

和:

occupations = people.each do | occupation | 
  occupation.inner_text.match("/(.*),/") 
end

match似乎没有从对象中提取我想要的值。我检查了 rubular.com,它应该可以工作,但我得到的字符串与我输入的字符串相同: 何时应该将其分开,"CEO, Company_1"以便.occupations = [CEO, COO, CFO]companies = [Company_1, Company_2, Company_3]

如何拆分这些对象?

4

1 回答 1

2

怎么不split发文?

require 'nokogiri'

xml = '<x>
<people>CEO, Company_1</people>
<people>COO, Company_2</people>
<people>CFO, Company_3</people>
</x>
'

doc = Nokogiri::XML(xml)
people = doc.search('people')
companies = people.map do |company| 
  company.text.split(',')
end

pp companies

=> [["CEO", " Company_1"], ["COO", " Company_2"], ["CFO", " Company_3"]]

如果您想摆脱公司之前的领先空间,请使用:

companies = people.map do |company| 
  company.text.split(/,\s*/)
end
=> [["CEO", "Company_1"], ["COO", "Company_2"], ["CFO", "Company_3"]]

或者:

companies = people.map do |company| 
  company.text.split(',').map(&:lstrip)
end
=> [["CEO", "Company_1"], ["COO", "Company_2"], ["CFO", "Company_3"]]

或使用map{ |s| s.sub(/^\s+/, '') }代替lstrip.

另请参阅“如何避免在抓取时加入来自节点的所有文本”。

于 2013-02-08T14:53:03.013 回答