r - R：在 R 上进行网络抓取时无法生成 xpath

Question

我正在努力抓取以下网站：

   http://www.crowdrise.com/waterforpeople-SE

如果你看一下这个网站，在右侧，在黑色按钮的正上方Fundraise for this campaign，有一个声明说：52% Raised of $20,000 Goal。我试图提取我刚才提到的这个陈述。

对于 xpath 表达式，我尝试过：

  .//*[@id="thebody"]/div[6]/div/div/div[2]/div[2]/div[2]/div/p/span

但它没有用......

什么是正确的 xpath 表达式？

谢谢你，

score 1 · Accepted Answer

试试这个：

> library(XML)
> doc <- htmlTreeParse('http://www.crowdrise.com/waterforpeople-SE', useInternalNodes = TRUE)
> xpathApply(doc, '//div[@class="grid1-4"]//p[@class="progressText"]')
[[1]]
<p class="progressText">
  <span>52% Raised of $20,000 Goal</span>
</p> 

attr(,"class")
[1] "XMLNodeSet"

或者直接获取文本值：

> xpathApply(doc, '//div[@class="grid1-4"]//p[@class="progressText"]', xmlValue)
[[1]]
[1] "52% Raised of $20,000 Goal"

r - R：在 R 上进行网络抓取时无法生成 xpath

1 回答 1

Related

Reference