regex - 使用 Regex 而不是 ImportXML 将 Google 搜索结果带入 Google 表格

Question

我正在 Google 表格中跟踪 Google 搜索结果的关键字。

使用 importXML 时，我可以导入的 XML 数量似乎受到限制，因为在使用一定量后，我在单元格中得到了 #N/A。

我通过@joshbradley 找到了这个自定义代码，它使用自定义脚本来使用正则表达式而不是 XPath，这旨在绕过任何限制。归功于乔希。

基本上这在脚本编辑器中：

    function importRegex(url, regexInput) {
  var output = '';
  var fetchedUrl = UrlFetchApp.fetch(url, {muteHttpExceptions: true});
  if (fetchedUrl) {
    var html = fetchedUrl.getContentText();
    if (html.length && regexInput.length) {
      output = html.match(new RegExp(regexInput, 'i'))[1];
    }
  }
  // Grace period to avoid call limit
  Utilities.sleep(1000);
  return unescapeHTML(output);
}

然后你像这样调用脚本

=importRegex("https://example.com", "<title>(.*)<\/title>")

从这里开始，我正在尝试调整从 GDS（Credit to Tara）中获取的以下代码，该代码会带来 Google 搜索结果，但使用上面的自定义 importregex 方法而不是 importxml。

=ARRAYFORMULA(REGEXEXTRACT(IMPORTXML("https://www.google.co.uk/search?q="& SUBSTITUTE(B$1, " ", "+") &"&pws=0&gl=UK&num=50", "//h3[@class='r']/a/@href[contains(.,'url')]"), "\/url\?q=(.+)&sa\b"))

更新

这是我尝试过的两种方法（第二种方法有数组），但都没有奏效。

=importRegex("https://www.google.co.uk/search?q="& SUBSTITUTE(B$1, " ", "+") &"&pws=0&gl=UK&num=50", "//h3[@class='r']/a/@href[contains(.,'url')]"), "\/url\?q=(.+)&sa\b"))

=ARRAYFORMULA(REGEXEXTRACT(importRegex("https://www.google.co.uk/search?q="& SUBSTITUTE(B$1, " ", "+") &"&pws=0&gl=UK&num=50", "//h3[@class='r']/a/@href[contains(.,'url')]"), "\/url\?q=(.+)&sa\b"))

如果它有帮助，我已经使用 importregex 脚本在此处提供了指向 Google 工作表的链接

regex - 使用 Regex 而不是 ImportXML 将 Google 搜索结果带入 Google 表格

0 回答 0

Related

Reference