3

I need to get a pattern matching rule to get this results.

allow /dir/path_name.htm/something
disallow /dir/path_name/something
and disallow /dir/path_name.htm

Actually those two disallows are typos accumulated all along. Those pages never exist. How to stop google crawling them never again?

I tested here: http://www.frobee.com/robots-txt-check/ with the following, but seems nothing working.

Allow: /dir/*.htm/?*
Disallow: /dir/*

What went wrong? Thank you.

4

1 回答 1

1

根据规范:

http://www.robotstxt.org/norobots-rfc.txt

*不允许使用通配符 ( )。路径只是完全匹配。我的猜测是您正在使用某种形式的重写,并且您不希望显示具有相同内容的多个 ulr。在这种情况下,这可能是一个更好的解决方案:

http://googlewebmastercentral.blogspot.de/2009/02/specify-your-canonical.html

于 2012-07-16T13:56:55.207 回答