r - 使用 R 正确提取锚元素的内部文本

Question

我正在使用 R 来抓取此链接 www.jamesaltucher.com/sitemap.xml 中的链接标题

这是我的代码。

library(XML)
library(RCurl)
url.link <- 'http://www.jamesaltucher.com/sitemap.xml'
blog <- getURL(url.link)
blog          <- htmlParse(blog, encoding = "UTF-8")
titles  <- xpathSApply (blog ,"//a",xmlValue)             ## titles

我titles的是一个空列表。

看截图：

在此处输入图像描述

我是否错误地使用了 xpath？

score 1 · Accepted Answer

是的。您正在寻找loc元素而不是a元素。

titles  <- xpathSApply (html ,"//loc",xmlValue)

score 0 · Accepted Answer

web_page <- readLines(" http://vueloeyewear.com/shop/retro/black-cia/ ")

author_lines <- web_page[grep("strong", web_page)]

author_lines <- author_lines [7:15]

测试 <- gsub(", ","",toString(author_lines))

测试 <- gsub("
","

“，测试）

author_lines <- htmlParse（测试）

xpathSApply (author_lines,"//p",xmlValue)

看这个，//Loc表示实际标签..

r - 使用 R 正确提取锚元素的内部文本

2 回答 2

Related

Reference