url - 阻止在 robots.txt 中包含数字的 URL

Question

我的网站允许搜索引擎以 2 种格式索引同一页面，例如：

我所有的网站页面都是这样的。那么，如何阻止 robots.txt 文件中的第一种格式？我的意思是有这样的代码：

Disallow: /page-(numbers).html

score 1 · Accepted Answer

1

于 2013-06-12T15:17:31.613 回答

score 0 · Accepted Answer

robots.txt 中没有这样的正则表达式选项。你有几个选择：

1) 将机器人禁止信息放入 html 文件中的 head 元素中。2) 编写一个脚本，将每个可阻止的 html 文件作为单独的行添加到 robots.txt 3) 将内容页面放在单独的目录中，并禁止访问该目录。

一些搜索引擎（例如 Google）（但不是全部）尊重模式匹配： http: //support.google.com/webmasters/bin/answer.py ?hl=en&answer=156449&from=35237&rd=1

User-agent: *
Disallow: /page-*.html
Allow: /page-*-page-title.html

这里允许覆盖禁止，这也不是所有搜索引擎都支持的。最简单的方法是重组您的文件（或进行 URL 重写）或然后将机器人信息放入 html 文件本身。

2 回答 2