seo - 如何防止 Googlebot 抓取我的 Underscore 客户端模板？

Question

在 Google 网站管理员工具中，在 Crawl Errors/Other 下，我们看到如下 URL 的 400 错误：

/family-tree/<%=tree.user_url_slug%>/<%=tree.url_slug%>

在此处输入图像描述

这不是一个真实的 URL，也不是我们打算抓取的 URL。它是一个下划线/主干模板：

<script type="text/template" class="template" id="template-trees-list">
  <% _.each(trees, function(tree) { %>
    <a href="/family-tree/<%=tree.user_url_slug%>/<%=tree.url_slug%>" rel="nofollow">
      <%= tree.title %>
    </a>
  <% }); %>
</script>

为什么 Google 会在script区块内爬行？
为什么 Google 会忽略该rel="nofollow"属性？
我们还能做些什么来让 Googlebot 远离我们的 Underscore 模板？

更新：如果我能找到正确的模式来保留好页面并阻止坏页面，我愿意使用 robots.txt。例如，我想/surnames/Jones/queries在阻塞时保持/surnames/Jones/queries/<%=url_slug%>。我有成千上万个这样的人。看起来Googlebot 可能支持基本模式，但不是完整的正则表达式。

更新 2：嗯，这不是根本原因，而且作为一个长期解决方案似乎有点脆弱，但我在 GWT 中测试了以下 robots.txt 模式将起作用：

User-agent: Googlebot
Disallow: /*url_slug%%3E$
Disallow: /*url_slug%%3E/$

score 1 · Accepted Answer

1

只需通过 robots.txt 阻止这些，就可以了

于 2014-04-23T17:52:52.800 回答

seo - 如何防止 Googlebot 抓取我的 Underscore 客户端模板？

1 回答 1

Related

Reference