html - html_attr 不是“href”属性

Question

首先，我真的是网络抓取的初学者。

所以在这个网站上工作。我试图通过有关 espisode 的讨论获取下一个网页的链接。使用 SelectorGadget 我设法只获得带有主题框架的 html 部分

html.s1e01 <- html("http://asoiaf.westeros.org/index.php/forum/41-e01-winter-is-coming/")

html.s1e01.page <- html_nodes(html.s1e01, ".ipsBox")

现在我想获取主题的所有链接，所以我尝试了

html_attr(html.s1e01.page, "href")

但我明白了NA。我在互联网上看到了类似的例子，它应该可以工作。有什么建议为什么不这样做？

score 2 · Accepted Answer

html.s1e01.page <- html_nodes(html.s1e01, ".ipsBox .topic_title")
html.s1e01.topics <- html.s1e01.page %>%  html_attr("href")
html.s1e01.topics

##  [1] "http://asoiaf.westeros.org/index.php/topic/49408-poll-how-would-you-rate-episode-101/"                        
##  [2] "http://asoiaf.westeros.org/index.php/topic/109202-death-of-john-aryn-season-4-episode-5-spoilers/"            
##  [3] "http://asoiaf.westeros.org/index.php/topic/49310-book-spoilers-episode-101-take-3/"                           
##  [4] "http://asoiaf.westeros.org/index.php/topic/90902-sir-john-standingjonarryn/"                                  
##  [5] "http://asoiaf.westeros.org/index.php/topic/106105-did-anyone-notice-the-color-of-the-feather-in-lyannas-tomb/"
##  [6] "http://asoiaf.westeros.org/index.php/topic/49116-book-tv-spoilers-what-was-left-out-and-what-was-left-in/"    
##  [7] "http://asoiaf.westeros.org/index.php/topic/49070-no-spoilers-ep101-discussion/"                               
##  [8] "http://asoiaf.westeros.org/index.php/topic/49159-book-spoilers-the-book-was-better/"                          
##  [9] "http://asoiaf.westeros.org/index.php/topic/57614-runes-in-agot-spoilers-i-suppose/"                           
## [10] "http://asoiaf.westeros.org/index.php/topic/49151-book-spoilers-ep101-discussion-mark-ii/"                     
## [11] "http://asoiaf.westeros.org/index.php/topic/49161-booktv-spoilers-dany-drogo/"                                 
## [12] "http://asoiaf.westeros.org/index.php/topic/49071-book-spoilers-ep101-discussion/"                             
## [13] "http://asoiaf.westeros.org/index.php/topic/49100-no-spoilers-pre-airing-discussion/"

html - html_attr 不是“href”属性

1 回答 1

Related

Reference