r - 无法使用 rvest 抓取简单文本 - 是因为嵌套的 html 代码还是 Javascript？

问问题 2021-08-05T07:36:10.757

33 次

我一直在尝试从以下网站上抓取 InChI-Key 文本，但没有成功：

“BBJPZPLAZVZTGR-UHFFFAOYSA-N”是我想要得到的文本。

这是我尝试过的几行：

url = "https://pubchem.ncbi.nlm.nih.gov/compound/32921#section=InChI-Key&fullscreen=true"
p = read_html(url)

版本 #1：

p %>% html_nodes('.section-content-item') %>% html_text()
p %>% html_elements('.section-content-item') %>% html_text()

这两行都给了我相同的答案：

character(0)

由于某些原因，它似乎无法检测到元素并且看不到节点：

{xml_nodeset (0)}

版本 #2：

inchikey <- p %>% 
  rvest::html_nodes("body") %>%
  xml2::xml_find_all("//div[contains(@class, 'section-content-item')]") %>%
  rvest::html_text()

同样，我无法找到带有代码的相关文本。

我已经坚持了几天，非常感谢任何帮助或建议！

0 回答 0