javascript - 为什么谷歌网站管理员工具看不到我网站的静态版本，而是动态版本的模板？

Question

我已将可蜘蛛程序包添加到我的 Meteor 应用程序中，并且在使用 url 发出请求时返回页面的 html 版本?_escaped_fragment_=，但我无法让 Google 抓取该站点。

细节

在Google Webmaster Tools中使用Fetch as Google并请求根页面时，页面返回的是 javascript 版本；就像是："http://example.com/"

HTTP/1.1 200 OK
content-type: text/html; charset=utf-8
date: Fri, 30 Nov 2012 05:39:36 GMT
connection: Keep-alive
transfer-encoding: chunked

<!DOCTYPE html>
<html>
  <head>
    <link rel="stylesheet" href="/e83157bdc4ff057fa3a20b82af4c11b4ebe776e7.css">
    <script type="text/javascript">
      __meteor_runtime_config__ = {"ROOT_URL":"http://www.example.com","DEFAULT_DDP_ENDPOINT":"https://www-example-com-ddp.meteor.com/"};
    </script>
    <script type="text/javascript" src="/13cf3d21ce1c4a88407ca5f3c250f186ab1738f9.js"></script>
    <meta name="fragment" content="!">
    <title>example.com</title>
  </head>
<body>
</body>
</html>

如果相反，我请求http://example.com/?_escaped_fragment_=返回 html 版本：

HTTP/1.1 200 OK
content-type: text/html; charset=UTF-8
date: Wed, 05 Dec 2012 02:44:09 GMT
connection: Keep-alive
transfer-encoding: chunked

<!DOCTYPE html>
<html>
  <head>
    <link rel="stylesheet" href="/e83157bdc4ff057fa3a20b82af4c11b4ebe776e7.css">
    <title>example.com</title>
    <meta name="viewport" content="initial-scale=1.0">
  </head>
  <body>
    <ul>
      <li><a href="/">Home</a></li>
      <li><a href="/one">One</a></li>
      <li><a href="/two">Two</a></li>
    </ul>
  </body>
</html>

问题

你如何告诉谷歌添加?_escaped_fragment_=到 url，以便它呈现 html 版本？
?_escaped_fragment_= 如果 url没有hashbangs ( !# )， Google 是否还会将 url 添加到 url 中？即/home，/products/1而不是/!#home，/!#products/1？
你如何让谷歌跟随链接的页面？并附加?_escaped_fragment_=? 页面的所有 js 版本都<meta name="fragment" content="!">在页眉中。我认为这就是所有需要的。

似乎最简单的解决方案是更新可蜘蛛包以将 html 版本返回给 Google Bot，而不是要求?_escaped_fragment_=，但如果这对其他人有用，我很好奇我做错了什么。

附加信息

Meteor 的可爬取程序包是一种临时解决方案，允许网络搜索引擎索引 Meteor 应用程序。

根据消息来源，它做了一些事情：

它将以下标签添加到head页面的 js 版本部分：

<head><meta name="fragment" content="!"></head>
使用PhantomJS解析 javascript 应用程序并在满足以下任一条件时返回 html 版本：

一种。请求用户代理是"facebookexternalhit"

湾。请求的 url 包含字符串?_escaped_fragment_=

score 6 · Accepted Answer

我相信这是一个“谷歌网站管理员工具”错误。

谷歌似乎确实在抓取该网站——这些页面出现在谷歌搜索结果中。然而，谷歌网站管理员工具仍然将总索引页面列为 1。但是，Bing 仍然没有抓取该页面。

编辑： 它的谷歌网站管理员工具页面被列为

未选中：由于与其他页面基本相似而未编入索引的页面，或者已被重定向到另一个 URL 的页面。更多信息。

EDIT2：针对乔纳坦的问题：

`?_escaped_fragment_=`如果 url 没有 hashbangs (!#)，Google 是否仍会将 url 添加到 url？

是的。我的应用程序没有在 url 中使用 hashbangs (!#)。并且 Google bot?_escaped_fragment_=在抓取时仍然会附加。以下是日志示例：

INFO HIT /url/2/01 66.249.72.42
INFO HIT /url/2/01?_escaped_fragment_= 66.249.72.142
INFO HIT /url/2/01 108.162.222.82
INFO HIT /url/2/01?_escaped_fragment_= 108.162.222.82
INFO HIT /url/2/05 108.162.222.82
INFO HIT /url/2/05?_escaped_fragment_= 108.162.222.214

Google bot 似乎会尝试使用和不使用?_escaped_fragment_=

score 2 · Accepted Answer

任何没有以开头的哈希片段的页面#!，例如主页，都需要这个：

 <meta name="fragment" content="!">

通知爬虫获取丑陋的 url（那个with _escaped_fragment_=）。显然它进入了该<head>部分。

更新：我注意到根据您问题末尾给出的插件描述，添加了上述元标记，您可以通过显示源代码来检查它是否包含在您的页面中。

通常，除了主页之外的所有其他页面都应该www.yoursite.com/#!hashfragment在漂亮的 URL 中包含类似的内容，其中!hash( #) 之后的内容作为爬虫的通知器，因此您不需要包含上面提到的元标记。

score 2 · Accepted Answer

我知道这个问题已经得到解答，但是对于从谷歌来这个问题的人来说。我想包括这个关于这个主题的截屏视频。

这帮助我理解了流星蜘蛛包。 https://www.eventedmind.com/tracks/feed-archive/meteor-the-spiderable-package

javascript - 为什么谷歌网站管理员工具看不到我网站的静态版本，而是动态版本的模板？

细节

问题

附加信息

3 回答 3

?_escaped_fragment_=如果 url 没有 hashbangs (!#)，Google 是否仍会将 url 添加到 url？

Related

Reference

`?_escaped_fragment_=`如果 url 没有 hashbangs (!#)，Google 是否仍会将 url 添加到 url？