seo - 谷歌：禁用 robots.txt 中的某些查询字符串

Question

http://www.site.com/shop/maxi-dress?colourId=94&optId=694
http://www.site.com/shop/maxi-dress?colourId=94&optId=694&product_type=sale

我有成千上万个像上面这样的 URL。不同的组合和名称。我也有这些具有查询字符串的 URL 的副本product_type=sale

我想禁止 Google 将任何内容编入索引product_type=sale

这在 robots.txt 中是否可行

score 22 · Accepted Answer

Google 支持 robots.txt 中的通配符。robots.txt 中的以下指令将阻止 Googlebot 抓取具有任何参数的任何网页：

Disallow: /*?

这不会阻止许多其他蜘蛛抓取这些 URL，因为通配符不是标准 robots.txt 的一部分。

Google 可能会花时间从搜索索引中删除您已阻止的 URL。额外的 URL 可能仍会被编入索引数月。您可以在网站管理员工具被阻止后使用“删除 URL”功能加快处理速度。但这是一个手动过程，您必须在其中粘贴要删除的每个单独的 URL。

It may also hurt your site's Google rankings to use this robots.txt rule in the case that Googlbot doesn't find the version of the URL without parameters. If you commonly link to the versions with parameters you probably don't want to block them in robots.txt. It would be better to use one of the other options below.

A better option is to use the rel canonical meta tag on each of your pages.

So both your example URLs would have the following in the head section:

<link rel="canonical" href="http://www.site.com/shop/maxi-dress">

That tells Googlebot not to index so many variations of the page, only to index the "canonical" version of the URL that you choose. Unlike using robots.txt, Googlebot will still be able to crawl all your pages and assign value to them, even when they use a variety of URL parameters.

Another option is to log into Google Webmaster Tools and use the "URL Parameters" feature that is in the "Crawl" section.

Once there, click on "Add parameter". You can set "product_type" to "Does not affect page content" so that Google doesn't crawl and index pages with that parameter.

enter image description here

Do the same for each of the parameters that you use that don't change the page.

score 9 · Accepted Answer

是的，这很简单。在您的 robots.txt 文件中添加以下行：

不允许：/*product_type=sale

前面的通配符 (*) 表示包含的任何 URLproduct_type=sale将不再被 Google 抓取。

虽然如果它们以前在 Google 的索引中，它们可能仍然保留在 Google 的索引中，但 Google 将不再抓取它们，并且在 Google 搜索中查看时会说：由于此站点的 robots.txt，此结果的描述不可用 – 了解更多.

在此处进一步阅读：Robots.txt 规范

seo - 谷歌：禁用 robots.txt 中的某些查询字符串

2 回答 2

Related

Reference