seo - 如何使用 robots.txt 只允许爬虫访问 index.php？

Question

如果我只想允许爬虫访问 index.php，这行得通吗？

User-agent: *
Disallow: /
Allow: /index.php

score 20 · Accepted Answer

是的，它会起作用。这是来自Google Webmaster Tool的测试结果。

Url
http://www.example.org/index.php

Googlebot
Allowed by line 3: Allow: /index.php

Googlebot-Mobile
Allowed by line 3: Allow: /index.php

但是，请记住，使用此配置，您的站点主页将不会被抓取，除非该页面是使用完整限定路径访问的。换句话说，http://www.example.org/被禁止而http://www.example.org/index.php被允许。

如果您希望您的主页可以访问，这里有一个更好的文件版本。

User-agent: *
Disallow: /
Allow: /index.php
Allow: /$

score 3 · Accepted Answer

3

User-agent: *

Allow: /index.php
Disallow: /

于 2011-03-02T11:42:17.237 回答

score 3 · Accepted Answer

尝试交换 Disallow / Allow 的顺序：

User-agent: *
Allow: /index.php
Disallow: /

从维基百科查看此信息：

“然而，为了与所有机器人兼容，如果你想在一个不允许的目录中允许单个文件，你需要首先放置 Allow 指令，然后是 Disallow，例如：”

我仍然不希望它工作太一致

score 2 · Accepted Answer

User-agent: *
Allow: /$
Allow: /index.php
Allow: /sitemap.xml
Allow: /robots.txt
Disallow: /

Sitemap: http://www.your-site-name.com/sitemap.xml

score 2 · Accepted Answer

您可以使用Google 机器人工具进行结帐。我永远不会在机器人文件中放置任何秘密目录，因为我猜像下面这样的行对于某些蜘蛛来说就像蜂蜜一样。

Disallow: /secret

5 回答 5