php - 需要从 robots 文件中屏蔽某些 URL

Question

我想在我的网站的 robots 文件中禁止某些 URL，并且遇到了一些困难。

现在我的机器人文件有以下内容：

User-agent: *

Allow: /
Disallow: /cgi-bin/

Sitemap: http://seriesgate.tv/sitemap.xml

我不希望 Google 将以下网址编入索引：

http://seriesgate.tv/watch-breakingbad-online/season5/episode8/searchresult/

还有 8000 个这样的 URL。因此，robots 文件中的代码阻止了所有这些。

而且我还想禁止机器人文件中的搜索框，以便 Google 不会抓取搜索页面，例如这个 URL：

seriesgate.tv/search/indv_episodes/friends/

有任何想法吗？

score 0 · Accepted Answer

添加Disallow: /name_of_folder/不允许谷歌爬入文件夹并添加 Disallow: /file_name不允许谷歌抓取特定文件..

score 0 · Accepted Answer

首先，您的 robots.txt（包括在您的问题中）无效。行后不能有User-agent换行符。

其次，您不需要这Allow条线，因为一切都是允许的，但无论如何都没有明确阻止。

如果您要阻止的所有 8000 个 URL 都以“watch-”开头，您可以使用：

Disallow: /watch-

要阻止搜索结果，您可以使用：

Disallow: /search/

请注意，您必须检查是否没有其他您不想阻止的页面与这些Disallow值匹配。

所以你的 robots.txt 可能看起来像：

User-agent: *
Disallow: /cgi-bin/
Disallow: /watch-
Disallow: /search/

Sitemap: http://seriesgate.tv/sitemap.xml

它会阻止以下 URL：

2 回答 2