4

I have a website saying :

http://domain.com/

mirror site on

http://cdn.domain.com/

I don't want cdn to be indexed. How can I write robots.txt rule to avoid the cdn from being indexed without disturbing my present robots.txt excludes.

My present robots.txt excludes :

User-agent: *
Disallow: /abc.php

How can I avoid cdn.domain.com from being indexed ?

User-agent: *
Disallow: /abc.php
4

2 回答 2

10

在您的根 .htaccess 文件中添加以下内容

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^Amazon.CloudFront$
RewriteRule ^robots\.txt$ robots-cdn.txt

然后创建一个单独的 robots-cdn.txt:

User-agent: *
Disallow: /

当通过http://cdn.domain.com/robots.txt访问时,将返回 robots-cdn.txt 文件的内容......否则重写将不会启动,真正的 robots.txt 将启动。

这样,您就可以自由地镜像整个站点(包括 .htaccess)文件并具有预期的行为

更新 :

  • HTTP_USER_AGENT这样做是因为亚马逊在从任何位置查询它时都会使用它。
  • 我已经验证并且它有效
于 2013-06-05T19:13:10.003 回答
0

If the codebase are the same, you can generate your robots.txt dynamically and change its content depending on the requested (sub)domain.

于 2013-06-05T18:23:34.293 回答