r - 抓取拍摄位置的坐标数据和附加信息

Question

我正在通过查看篮球统计数据来了解 R，并且我想提取出投篮图表中显示的信息。

我正在查看以下 D'Angelo Russell 的投篮图表：

https://www.basketball-reference.com/players/r/russeda01/shooting/2019

我正在使用library(rvest)包中的工具以下列方式抓取数据：

> dlo_html <- read_html("https://www.basketball-reference.com/players/r/russeda01/shooting/2019")
> dlo_nodes1 <- html_nodes(dlo_html, "table")
> dlo_makes <- html_table(dlo_nodes1)

...所以现在当我运行时，head(dlo_makes)我会从网页左侧的表格中得到一个 74 行和 11 列的 data.frame 进行排序。这是一个不错的开始。

但是，我真正想要的是页面右侧的镜头图表图形中包含的信息。我可以在 html 的源代码中看到它。如果您在源中搜索shot-area，它的正下方大约有 1500 行数据，如下所示：

<div style="top:57px;left:237px;" tip="Oct 17, 2018, BRK at DET<br>1st Qtr, 10:38 remaining<br>Missed 2-pointer from 2 ft<br>BRK leads 2-0" class="tooltip miss">&#215;</div>
<div style="top:154px;left:341px;" tip="Oct 17, 2018, BRK at DET<br>1st Qtr, 10:30 remaining<br>Made 2-pointer from 14 ft<br>BRK now leads 4-0" class="tooltip make">&#9679;</div>
etc.

我是否将不正确的信息传递到html_nodes()命令中？或者我应该使用不同的命令而不是html_table查看节点？还是我在这里缺少其他东西？

score 1 · Accepted Answer

您想要的数据被写成评论而不是动态加载。

我使用视图源来获取包含数据的 div 并调用

all_shot 图表

所以这是获得你想要的东西的代码

dlo_html <- read_html("https://www.basketball-reference.com/players/r/russeda01/shooting/2019")

Commented_Section <- dlo_html%>%html_nodes("[id = 'all_shot-chart']")%>%html_nodes(xpath = 'comment()')%>%
        html_text() %>% read_html() %>%html_node('table')

Missed_Plays <- Commented_Section %>% html_nodes("[class='tooltip miss']")
Maked_Plays <- Commented_Section %>% html_nodes("[class='tooltip make']")

我可以在这个问题中找到如何获得评论部分。

如何在 R 中使用 readHTMLTable 读取注释掉的 HTML 表

r - 抓取拍摄位置的坐标数据和附加信息

1 回答 1

Related

Reference