xml - 无法在 R 中加载 HTTP 源

Question

我尝试使用以下代码解析网页，但最后一行输出“未能加载 HTTP 资源”。谁能告诉我如何处理它？谢谢！代码是：

library(XML);library(RCurl)
page=getForm("http://jobsearch.monster.com/search",query="data science")
doc = htmlParse(page, asText = TRUE)
joblinks = getNodeSet(doc, "//div[@class = 'jobTitleContainer']//a/@href")
htmlParse(joblinks[[1]])

score -1 · Accepted Answer

有两件事 ?htmlParse会指向isURL标志，默认情况下为 FALSE。您想将此设置为 TRUE。

其次，中的网址joblinks[[1]] 似乎不起作用。这似乎不是您的R代码的问题，只是您要提取的信息：

# works
htmlParse("http://stackoverflow.com/questions/13852853/failed-to-load-http-source-in-r", isURL=TRUE)

# doesnt work 
htmlParse("http://jobview.monster.com/Cleaning-Supervisor-Job-1513-Rebel-Southwest-OH-117109119.aspx", isURL=TRUE)

xml - 无法在 R 中加载 HTTP 源

1 回答 1

Related

Reference