php - 如何阻止谷歌抓取不存在的页面

Question

当我在开发我的网站时。我在一个地方打错了，例如，我所有的页面都是 dir1/dir2/page.htm/par1-par2，但我的错字是 dir1/dir2/page/par1-par2（注意：没有 .htm）。

它只生产了 1 天，但 Google 一直在抓取这些链接。如何阻止谷歌这样做？

顺便说一句，那不是一页，而是数百或数千页。

score 2 · Accepted Answer

尝试使用 robots.txt 拒绝访问此页面（网址）

http://www.robotstxt.org/robotstxt.html

http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449

在此处测试 robots.txt：http ://www.frobee.com/robots-txt-check/

patterns must begin with / because robots.txt patterns always match absolute URLs. 
* matches zero or more of any character. 
$ at the end of a pattern matches the end of the URL; elsewhere $ matches itself. 
* at the end of a pattern is redundant, because robots.txt patterns always match any URL which begins with the pattern.

score 1 · Accepted Answer

如果该页面存在（可能是因为您使用 mod_rewrite）并且未找到呈现自定义页面但未发送 http 410 Gone 标头header("HTTP/1.0 410 Gone");，那么 google 将不知道它已被删除并对其进行索引。

您需要添加正确的标题或删除页面或不呈现您自己的 404，因此它会访问您的服务器 404，然后谷歌将从索引中删除该页面，并且该页面的删除不会在一夜之间发生：

您也可以将 url 添加到 robots.txt 文件中，这也不能保证从索引中删除该页面，您可以按照其他人所说的那样联系谷歌，但不能保证得到响应或删除。

User-agent: *
Disallow: /dir1/dir2/page/par1-par2

祝你好运。

score -1 · Accepted Answer

谷歌有一个表格，你可以要求它从索引中删除一个页面。

查看此链接中的信息：

http://support.google.com/webmasters/bin/answer.py?hl=en&answer=164734

php - 如何阻止谷歌抓取不存在的页面

3 回答 3

Related

Reference