r - 如何使用 r 在谷歌学者上下载搜索结果？

Question

我想使用 R 提取 Google Scholar 搜索的前 100 个结果（例如）。有人知道该怎么做吗？

准确地说，我只需要论文的名称、作者和引用次数。

Ps 这合法吗？

score 5 · Accepted Answer

请考虑更新后的 biobucket-post：

http://thebiobucket.blogspot.com/2011/11/r-function-google-scholar-webscraper.html

score 4 · Accepted Answer

有一些 Python 和 Perl 抓取工具可供您使用，链接在http://bmb-common.blogspot.com/2011/02/does-google-scholar-suck-or-am-i-just .html

score 3 · Accepted Answer

您绝对可以使用 RCurl 检索页面的 HTML 内容，并按照 Btibert3 的建议使用 RXML 解析它们。您可能面临的唯一问题是 Google 不允许您以“机器人”方式进行查询。在短时间内在 Google 中进行了 200 次查询后，它将不再返回结果。也许这与谷歌学术不同，但我怀疑......

score 3 · Accepted Answer

我不能谈论你的任务的合法性，但有几种方法可以解决这个问题。虽然我不擅长 XPath，但它可能是最好的方法。我相信您可以使用 XML 包来检索页面内容并使用 XPath 来提取您需要的元素的数据。

例如，我使用 Chrome 作为浏览器，当我使用开发人员工具检查页面时，页面似乎确实有一个结构，数据“隐藏”在各种标签中，你应该能够很容易地利用这些标签来利用XPath。

查看此链接以获取使用 XPath 的示例。

HTH 和好运

score 1 · Accepted Answer

最近在这里发布了一个解决方案：

http://thebiobucket.blogspot.com/2011/11/visually-examine-google-scholar-search.html

r - 如何使用 r 在谷歌学者上下载搜索结果？

5 回答 5

Related

Reference