0

我正在使用 Nokogiri 从网页中抓取数据,我的印象是以下内容会抓取数据并返回为数组?相反,我得到了一个导致一些问题的大字符串。

 home_team = doc.css(".team-home.teams")

如果我要使用

home_team = doc.css(".team-home.teams").text

我可以理解作为字符串返回的数据。我看错了吗?

我什至尝试过

home_team = doc.css(".team-home.teams").map(&:text) 

但这似乎也返回了一个字符串?如果我在控制台中返回一个数组,它会是数组格式吗?

如果有人可以在他们的控制台中尝试这个

require 'open-uri'
require 'nokogiri'


FIXTURE_URL = "http://www.bbc.co.uk/sport/football/premier-league/fixtures"

doc = Nokogiri::HTML(open(FIXTURE_URL))
home_team = doc.css(".team-home.teams").map(&:text)
#home_team = doc.css(".team-home.teams")
puts home_team

只需确认两种情况下的输出都是字符串以及两者之间的区别是什么。莫名的失落

谢谢

4

2 回答 2

2

你得到一个数组。这只是puts在做一个to_s。看一下这个:

require 'open-uri'
require 'nokogiri'

FIXTURE_URL = "http://www.bbc.co.uk/sport/football/premier-league/fixtures"

doc = Nokogiri::HTML(open(FIXTURE_URL))
home_team = doc.css(".team-home.teams").map(&:text)
# home_team = doc.css(".team-home.teams")
puts home_team.class
puts home_team.map(&:strip).inspect

#=> Array
#=> ["Everton", "Aston Villa", "Southampton", "Stoke", "Swansea", "Man Utd", "Sunderland", "Tottenham", "Chelsea", "Wigan", "Sunderland", "Arsenal", "Man City", "Swansea", "West Ham", "Wigan", "Everton", "Aston Villa", "Southampton", "Fulham", "Reading", "Chelsea", "Newcastle", "Norwich", "Stoke", "West Brom", "Liverpool", "Tottenham", "QPR", "Man Utd", "Newcastle", "Arsenal", "Aston Villa", "Everton", "Reading", "Southampton", "Stoke", "Chelsea", "Arsenal", "Fulham", "Norwich", "QPR", "Sunderland", "Swansea", "West Brom", "West Ham", "Tottenham", "Liverpool", "Man Utd", "Man City", "Aston Villa", "Chelsea", "Everton", "Southampton", "Stoke", "Wigan", "Newcastle", "Reading", "Arsenal", "Fulham", "Liverpool", "Man Utd", "Norwich", "QPR", "Sunderland", "Swansea", "Tottenham", "West Brom", "West Ham", "Arsenal", "Aston Villa", "Everton", "Fulham", "Man Utd", "Norwich", "QPR", "Reading", "Stoke", "Sunderland", "Chelsea", "Liverpool", "Man City", "Newcastle", "Southampton", "Swansea", "Tottenham", "West Brom", "West Ham", "Wigan"]
于 2013-03-12T13:58:46.350 回答
1

数据中有很多空白。当我这样做时,我得到一个数组:

home_team = doc.css(".team-home.teams").map {|team| team.text.strip}
于 2013-03-12T13:54:26.360 回答