这里有几件事:
我不确定您为什么要返回 503 错误。该机器人仍然占用一些相同的服务器资源。
您应该考虑禁用机器人的会话管理(或至少最小化会话超时)。
如果您试图阻止机器人,您还应该使用 robots.txt(有关这方面的详细信息,请参见http://www.robotstxt.org/ )。
您很可能已经在使用 robots.txt,但以后访问此页面的任何人都应该注意这一点。
下面的 UDF 基于Ben Nadel 的工作。但是,其中的数据应该保持更新。
我最终可能会按照我在自己的 SpamFilter.cfc 中使用的模式来执行此操作。不过,就目前而言,以下一对 UDF 应该可以帮助您入门。
请注意,我的 UDF 将 CFSCHEDULE 视为机器人,因为我不想为其使用会话。如果要阻止所有机器人,则应将其从列表中删除。
<cffunction name="hasCFCookies" access="public" returntype="boolean">
<cfreturn ( StructKeyExists(Cookie,"CFID") AND StructKeyExists(Cookie,"CFTOKEN") )>
</cffunction>
<cfset request.hasCFCookies = hasCFCookies>
<cffunction name="isBot" access="public" returntype="boolean">
<!---
Based on code by Ben Nadel:
http://www.bennadel.com/blog/154-ColdFusion-Session-Management-Revisited-User-vs-Spider-III.htm
--->
<cfset var UserAgent = "">
<!--- If the user has cookies, this is at least a second request from a real user --->
<cfif hasCFCookies()>
<cfreturn false>
</cfif>
<!--- Real users have user-agent strings --->
<cfset UserAgent = LCase( CGI.http_user_agent )>
<cfif NOT Len(UserAgent)>
<cfreturn true>
</cfif>
<!---
High-probability checks
If the user agent has bot or spider in it, it is a bot
Some specific high-volume spiders listed individually
--->
<cfif
REFind( "bot\b", UserAgent )
OR Find( "spider", UserAgent )
OR REFind( "search\b", UserAgent )
OR UserAgent EQ "CFSCHEDULE"
>
<cfreturn true>
</cfif>
<!---
If we haven't yet tagged it as a bot and it is on Windows or Mac (including iOs devices), call it a real user.
If this results in a few spiders showing as real users that is OK
--->
<cfif REFind( "\windows\b", UserAgent ) OR REFind( "\bmac", UserAgent )>
<cfreturn false>
</cfif>
<!--- If we don't know yet, only figure spiders from a known list of a few --->
<cfif
REFind( "\brss", UserAgent )
OR Find( "slurp", UserAgent )
OR Find( "xenu", UserAgent )
OR Find( "mediapartners-google", UserAgent )
OR Find( "zyborg", UserAgent )
OR Find( "emonitor", UserAgent )
OR Find( "jeeves", UserAgent )
OR Find( "sbider", UserAgent )
OR Find( "findlinks", UserAgent )
OR Find( "yahooseeker", UserAgent )
OR Find( "mmcrawler", UserAgent )
OR Find( "jbrowser", UserAgent )
OR Find( "java", UserAgent )
OR Find( "pmafind", UserAgent )
OR Find( "blogbeat", UserAgent )
OR Find( "converacrawler", UserAgent )
OR Find( "ocelli", UserAgent )
OR Find( "labhoo", UserAgent )
OR Find( "validator", UserAgent )
OR Find( "sproose", UserAgent )
OR Find( "ia_archiver", UserAgent )
OR Find( "larbin", UserAgent )
OR Find( "psycheclone", UserAgent )
OR Find( "arachmo", UserAgent )
>
<cfreturn true>
</cfif>
<cfreturn false>
</cffunction>