1

我正在尝试使用 XML2 包从 ESPN.com 上抓取一些表格。例如,我想将第 7 周的幻想四分卫排名刮到 R 中,其 URL 为:

http://www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-quarterback-rankings

我正在尝试使用“read_html()”函数来执行此操作,因为这是我最熟悉的。这是我的语法及其错误:

> wk.7.qb.rk = read_html("www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-rankings-quarterbacks", which = 1)
Error: 'www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-rankings-quarterbacks' does not exist in current working directory ('C:/Users/Brandon/Documents/Fantasy/Football/Daily').

我也试过“read_xml()”,只是得到同样的错误:

> wk.7.qb.rk = read_xml("www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-rankings-quarterbacks", which = 1)
Error: 'www.espn.com/fantasy/football/story/_/page/16ranksWeek7QB/fantasy-football-week-7-rankings-quarterbacks' does not exist in current working directory ('C:/Users/Brandon/Documents/Fantasy/Football/Daily').

为什么 R 在工作目录中寻找这个 URL?我已经用其他 URL 尝试过这个功能并取得了一些成功。这个特定的 URL 是什么使它看起来与其他 URL 不同的位置?而且,我该如何改变呢?

4

1 回答 1

3

我在循环中运行 read_html 以浏览 20 页时遇到此错误。在第 20 页之后,循环仍然在没有 url 的情况下运行,因此它开始使用 NA 调用 read_html 以进行其他循环迭代。希望这会有所帮助!

于 2017-04-18T13:37:01.157 回答