1

I apologize if this isn't the right place to ask this and appreciate any help in moving it to a more appropriate forum if necessary. My original question was going to be along the lines of what I need to do in the PHP to change this behavior... but I'm not sure if it's really the problem or not so this is a pre-question of sorts to find out whether I even need to bother asking the programming question that I was going to ask.

We have a site that's using too much bandwidth. I was told it's being caused by web crawlers so I checked at sure enough that seems to be the case. One thing I noticed was that 403 errors were responsible for most of the traffic. I didn't see how that was possible since I would expect a 403 error to just send a little bit of informational text, but when I tried purposefully going to a url that didn't exist it redirected me to the homepage.

So I'm assuming every time a non-existent link is hit by a web crawler that it's transferring everything on the homepage... and I wonder if the web crawler thinks this is a new starting point from which it needs to branch out to all the links on the homepage all over again since it was hammering the website for over 24 hours straight before it got taken down?


EDIT: Seems that I made a mistake as halfer pointed out. I saw '403' and immediately thought of the wrong thing. It's is 403 (forbidden access) that is the issue, so maybe that means someone is trying to hack into the website?


4

1 回答 1

2

不要混淆 403 和 404 错误。403 表示禁止,404 表示找不到页面。

您必须在您的网站的根目录上有一个 .htaccess 文件,其中包含类似的内容(404 错误也是如此):

ErrorDocument 403  index.php

更改index.php为您添加的静态页面或消息:

ErrorDocument 403 "forbidden

你有网站地图吗?大多数爬虫都使用它。阅读这篇有趣的文章

检查哪些爬虫发出更多请求,如有必要,通过 IP 阻止它们。

于 2013-11-14T20:09:34.643 回答