我正在尝试使用 XML、RCurl 或 httr 库在 R 中抓取以下网页:http: //accuscore.com/fantasy-sports/nfl-fantasy-sports/Rest-of-Season-RB
该网页在我的浏览器中正确打开。这是我抓取网页的尝试:
library("XML")
#this works fine (QB projections)
qb <- readHTMLTable("http://accuscore.com/fantasy-sports/nfl-fantasy-sports/", header=1)$fantasy_table
#this does not (RB projections)
rb <- readHTMLTable("http://accuscore.com/fantasy-sports/nfl-fantasy-sports/Rest-of-Season-RB", header=1)$fantasy_table
library("RCurl")
htmlParse("http://accuscore.com/fantasy-sports/nfl-fantasy-sports/Rest-of-Season-RB")
library("httr")
GET("http://accuscore.com/fantasy-sports/nfl-fantasy-sports/Rest-of-Season-RB")
我收到 readHTMLTable 和 htmlParse 的以下错误:“错误:加载 HTTP 资源失败”。使用 GET,我收到状态代码 404,这表明找不到资源,并且我发送请求的方式可能存在错误。鉴于我可以在浏览器中打开网页,我不确定问题出在哪里。也许它是一种不同于功能预期的文件?有任何想法吗?
理想情况下,抓取将针对所有 146 个条目(不仅仅是前 25 个)。