ajax - Google 索引：_escaped_fragment_ 不适用于主页

Question

我确实将我的网站 (GWT) 设置为可被 Google 抓取。在 Google webmastertool 上使用“fetch as google”页面时，我看到以下模式：

访问“http://www.mysite.com/#!AJAX_URL”被正确重定向到快照
但谷歌并没有请求“http://www.mysite.com”的快照，尽管我确实在 web.xml 中设置了

==> 与此相关的两个问题：

是不是因为谷歌站长工具不够聪明，但真正的机器人会正确请求快照
我应该在 web.xml 或其他任何地方添加一些东西吗？

谢谢，

雨果

score 4 · Accepted Answer

经过大量搜索，我找到了答案。它只是 Fetch as Googlebot 功能，它不检查元标记，而只是返回原始内容。当谷歌抓取和索引页面时，他们会注意到元标记并采取相应的行动。

答案的链接在这里（见JohnMu的评论）：

score 0 · Accepted Answer

确保您的“robots.txt”允许爬虫访问：

User-agent: *
Allow: /

此外，您可能需要向网站管理员工具提交站点地图。

听起来快照服务正确。以防万一，我发布了工作“index.php”的相关部分。静态页面位于 'static/${TOKEN}.html'

<!doctype html>
<?php

function static_url ($token) { return 'static/' . $token . '.html'; }

$escaped_fragment = $_GET['_escaped_fragment_'];

if (isset($escaped_fragment)) {
  $fragment = preg_replace('/\//', '', $escaped_fragment);
  $file = static_url($fragment);

  if($escaped_fragment == '' || $escaped_fragment == '/'
      || (! file_exists($file))) {
    $fragment = '${DEFAULT_PLACE}:${DEFAULT_STATE}'; // your default place
    $file = static_url($fragment);
  }
  $re = '/(^<[^>]*>)|(\n|\r\n|\t|\s{2,4})*/';

  $handle = fopen($file, 'r');
  if ($handle != false) {
    $content = preg_replace($re, '', fread($handle, filesize($file)));
    fclose($handle);
  }
  else {
    $content = 'Page not found!';
    header(php_sapi_name() == 'cgi' ? 'Status: 404' : 'HTTP/1.1 404');
  }
  echo $content;
} else { ?>

<html> ... Your GWT host page ... </html>

<? } ?>

ajax - Google 索引：_escaped_fragment_ 不适用于主页

2 回答 2

Related

Reference