python - 无法获取用于屏幕抓取的 xpath

Question

我正在尝试从该网站http://www.soccerstats.com/latest.asp?league=england抓取目录。我在 python 中使用 scrapy 来获取此表中的详细信息。

<div id="league-table-data" style="text-align:center;clear:both;">
        </div>

我尝试了许多 xpath 表达式。首先，我只是想在该表中首先使用团队名称

hxs.select('//div[contains(@id, "league-table")]/div[descendant::td[contains(@align, "left")]]/a/text()').extract()

但是，它返回一个空列表。有什么想法可以让它发挥作用吗？谢谢你。

score 1 · Accepted Answer

看起来您只需要：

>>> hxs.select('//*[@id="league-table-data"]/table/tr/td/a/text()').extract() 
[u'Manchester Utd', u'Manchester City', u'Chelsea', u'Arsenal', u'Tottenham', u'Everton', u'Liverpool', u'West Bromwich', u'Swansea City', u'West Ham Utd', u'Norwich City', u'Fulham', u'Stoke City', u'Southampton', u'Aston Villa', u'Newcastle Utd', u'Sunderland', u'Wigan Athletic', u'Reading', u'QP Rangers']

快速提示：在 Google Chrome 中获取 XPath。

python - 无法获取用于屏幕抓取的 xpath

1 回答 1

Related

Reference