由于 360Spider,我遇到了一个问题:这个机器人每秒向我的 VPS 发出太多请求并减慢它的速度(CPU 使用率变为 10-70%,但通常我有 1-2%)。我查看了 httpd 日志并看到了这样的行:
182.118.25.209 - - [06/Sep/2012:19:39:08 +0300] "GET /slovar/znachenie-slova/42957-polovity.html HTTP/1.1" 200 96809 "http://www.hrinchenko.com/slovar/znachenie-slova/42957-polovity.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.0.11) Gecko/20070312 Firefox/1.5.0.11; 360Spider
182.118.25.208 - - [06/Sep/2012:19:39:08 +0300] "GET /slovar/znachenie-slova/52614-rospryskaty.html HTTP/1.1" 200 100239 "http://www.hrinchenko.com/slovar/znachenie-slova/52614-rospryskaty.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.0.11) Gecko/20070312 Firefox/1.5.0.11; 360Spider
等等
如何通过 robots.txt 完全阻止这个蜘蛛?现在我的 robots.txt 看起来像这样:
User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
User-agent: YoudaoBot
Disallow: /
User-agent: sogou spider
Disallow: /
我添加了行:
User-agent: 360Spider
Disallow: /
但这似乎不起作用。如何阻止这个愤怒的机器人?
如果您提议通过 .htaccess 阻止它,请注意它现在看起来像这样:
# Turn on URL rewriting
RewriteEngine On
# Installation directory
RewriteBase /
SetEnvIfNoCase Referer ^360Spider$ block_them
Deny from env=block_them
# Protect hidden files from being viewed
<Files .*>
Order Deny,Allow
Deny From All
</Files>
# Protect application and system files from being viewed
RewriteRule ^(?:application|modules|system)\b.* index.php/$0 [L]
# Allow any files or directories that exist to be displayed directly
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
# Rewrite all other URLs to index.php/URL
RewriteRule .* index.php/$0 [PT]
而且,尽管存在
SetEnvIfNoCase Referer ^360Spider$ block_them
Deny from env=block_them
这个机器人仍然试图杀死我的 VPS 并登录到访问日志中。