django - Django 应用程序中无法访问的 Robots.txt

Question

收到来自谷歌网站管理员工具的通知，由于“无法访问 robots.txt”，谷歌的爬虫机器人已停止爬取某个特定网站。不幸的是，谷歌没有提供任何关于爬虫错误的额外细节。

我有

<meta name="robots" content="index, follow">

作为我的元标记之一包含在 base.html 模板中，我为每个 django 应用程序都这样做，而且我的任何其他网站都没有这个问题。如果我错了，请纠正我，但我也认为 robots.txt 没有必要让谷歌索引你。

我尝试通过安装和配置 django-robots ( https://github.com/jezdez/django-robots ) 并将其添加到我的 url conf 来解决：

(r'^robots\.txt$', include('robots.urls')),

我最新的谷歌爬虫获取（在将 django-robots 推送到 prod 之后）仍然返回相同的错误。

我没有任何特殊的抓取规则，即使不包含 robots.txt 文件也可以，以便谷歌索引整个网站。在我尝试这里提到的其他两种方法之前，任何人都对快速修复有任何想法：http: //fredericiana.com/2010/06/09/three-ways-to-add-a-robots-txt-to -你的django项目/？

score 0 · Accepted Answer

我尝试从 urls.py 中完全删除 robots.txt 行并以 google 获取，但这并没有解决问题。

(r'^robots\.txt$', include('robots.urls')),

我通过稍微修改我的根 urlconf 来解决这个问题

from django.http import HttpResponse


(r'^robots\.txt$', lambda r: HttpResponse("User-agent: *\nDisallow: /*", mimetype="text/plain")),

现在 googlebot 正在抓取它。希望我能更好地理解为什么这个特定的解决方案对我有效，但它确实有效。

感谢路德维克的帮助。

score 0 · Accepted Answer

如果你有许可，那么

Alias /robots.txt /var/www/---your path ---/PyBot/robots.txt

将别名添加到您的virtual host. （在 apache 配置文件中）同样适用于 favicon

Alias /favicon.ico /var/www/aktel/workspace1/PyBot/PyBot/static/favicon.ico

2 回答 2