0

我正在使用以下方法阻止不良和无用的机器人:

RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
RewriteCond %{HTTP_USER_AGENT} 360Spider [OR]
RewriteCond %{HTTP_USER_AGENT} A(?:ccess|ppid) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} C(?:apture|lient|opy|rawl|url) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} D(?:ata|evSoft|o(?:main|wnload)) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} E(?:ngine|zooms) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} f(?:etch|ilter) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} genieo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Ja(?:karta|va) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Li(?:brary|nk|bww) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} nutch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Pr(?:oxy|ublish) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} robot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} s(?:craper|istrix|pider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} W(?:get|(?:in(32|Http))) [NC]
RewriteRule .? - [F]

完成 htaccess 文件:

AddDefaultCharset UTF-8

RewriteEngine on

#inherit from root htaccess and append at last, necessary in root too
RewriteOptions inherit

#block bad bots
RewriteCond %{HTTP_USER_AGENT} ^$ [OR]
RewriteCond %{HTTP_USER_AGENT} 360Spider [OR]
RewriteCond %{HTTP_USER_AGENT} A(?:ccess|ppid) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} C(?:apture|lient|opy|rawl|url) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} D(?:ata|evSoft|o(?:main|wnload)) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} E(?:ngine|zooms) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} f(?:etch|ilter) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} genieo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Ja(?:karta|va) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Li(?:brary|nk|bww) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} nutch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Pr(?:oxy|ublish) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} robot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} s(?:craper|istrix|pider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} W(?:get|(?:in(32|Http))) [NC]
RewriteRule .? - [F]

#include caching for images
<IfModule mod_expires.c>
    ExpiresActive On
    ExpiresByType image/gif "access plus 1 month"
    ExpiresByType image/jpeg "access plus 1 month"
    ExpiresByType image/png "access plus 1 month"
    ExpiresByType image/x-icon "access plus 360 days"
    ExpiresByType text/css "access plus 1 day"
    ExpiresByType text/html "access plus 1 week"
    ExpiresByType text/javascript "access plus 1 week"  
    ExpiresByType text/x-javascript "access plus 1 week"
    ExpiresByType application/javascript "access plus 1 week"
    ExpiresByType application/x-javascript "access plus 1 week"
    ExpiresByType application/x-shockwave-flash "access plus 1 week"
    ExpiresByType font/truetype "access plus 1 month"
    ExpiresByType font/opentype "access plus 1 month"
    ExpiresByType application/x-font-otf "access plus 1 month"
</IfModule>

RewriteCond %{HTTP_HOST} ^nix.foo.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.nix.foo.com$
RewriteRule ^(.*)$ "http\:\/\/www\.foo\.com\/nix\.php" [R=301,L]

RewriteCond %{HTTP_HOST} ^gallery.foo.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.gallery.foo.com$
RewriteRule ^(.*)$ "http\:\/\/www\.foo\.com\/gallery\.php" [R=301,L]

RewriteCond %{HTTP_HOST} ^blog.foo.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.blog.foo.com$
RewriteRule ^(.*)$ "http\:\/\/www\.foo\.com\/blog" [R=301,L]

RewriteCond %{HTTP_HOST} ^id.foo.com$ [OR]
RewriteCond %{HTTP_HOST} ^www.id.foo.com$
RewriteRule ^/?$ "http\:\/\/foo\.myopenid\.com\/" [R=301,L]

redirect 301 /map.php http://www.foo.com/maps/map.php

RedirectMatch 301 ^/(map(?!pa_area51\.)[^/.]+\.php)$ http://www.foo.com/maps/$1

Options +FollowSymLinks
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]

在我从 Litespeed 网络服务器托管切换到 Apache 之前,它运行良好(http 403)。它们都是共享托管服务。现在我得到:

Forbidden

You don't have permission to access /robots.txt on this server.

Additionally, a 500 Internal Server Error error was encountered while trying to use an ErrorDocument to handle the request.

这是访问日志中的示例:

208.115.111.68 - - [22/Sep/2013:17:56:48 +0200] "GET /robots.txt HTTP/1.1" 500 576 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)"

关于那个 http 500 错误的任何提示?提前致谢

4

1 回答 1

0

即使对我来说,RewriteCond 也很难阻止机器人立即使用 SetEnvIfNoCase 像这样:-

SetEnvIfNoCase User-Agent ^.*360Spider.*$ bad_bot
SetEnvIfNoCase User-Agent A(?:ccess|ppid) bad_bot
SetEnvIfNoCase User-Agent C(?:apture|lient|opy|rawl|url) bad_bot

Order deny,allow
Deny from env=bad_bot

并做剩下的..这将给出坏机器人 403 错误消息。

于 2014-06-05T03:19:20.900 回答