1

我无法在谷歌上搜索有关此主题的有用信息,因此我将不胜感激涉及该主题的文章的链接或此处的直接答案,两者都可以。

我正在一个有很多访问者的网站上用 PHP/MySQL 实现搜索系统,所以我将对允许访问者在搜索字段中输入的字符长度和所需的最短时间实施一些限制在两次搜索之间。由于我对这些问题有点陌生,而且我真的不知道为什么通常这样做的“真正原因”,所以我只是假设实现字符最小长度以最小化数据库将返回的结果数量,并实施搜索之间的时间,以防止机器人向搜索系统发送垃圾邮件并减慢网站速度。那正确吗?

最后,如何实现两次搜索之间的最短时间的问题。我用伪代码提出的解决方案是这样的

  1. 在提交搜索表单的 URL 处设置测试 cookie
  2. 将用户重定向到应输出搜索结果的 URL
  3. 检查测试cookie是否存在
    • 如果不是,输出一个警告,他不允许使用搜索系统(可能是机器人)
  4. 检查是否存在告知上次搜索时间的 cookie
    • 如果这不到 5 秒前,输出一个警告,他应该在再次搜索之前等待
  5. 搜索
  6. 将上次搜索时间设置为当前时间的 cookie
  7. 输出搜索结果

这是最好的方法吗?

我理解这意味着禁用 cookie 的访问者将无法使用搜索系统,但如今这真的是个问题吗?我找不到 2012 年的统计数据,但我设法找到数据表明 3.7% 的人在 2009 年禁用了 cookie。这似乎并不多,我想这些天可能会更少。

4

2 回答 2

0

"only my assumptions that the character minimum length is implemented to minimize the number of results the database will return". Your assumption is absolutely correct. It reduces the number of potential results, by forcing the user to think about, what it is they wish to search.

As far as bots spamming your search, you could implement a captcha, the most frequently used is recaptcha. If you don't want to show a captcha right away, you can track (via session) the number of times the user submitted search, and if X amount of searches occur within a certain time frame, then render the captcha.

I've seen sites like SO and thechive.com implement this type of strategy, where captcha isn't rendered right away, but will be rendered if a threshold is encountered.

于 2013-03-21T00:33:20.140 回答
0

This way you're preventing Search Engine from indexing your search results. A cleaner way of doing this would be:

  1. Get IP where search originated
  2. Store that IP in a cache system such as memcached and the time that query was made
  3. If another query is sent from same IP and less then x second passed simply reject it or make the user wait

Another thing you can do to increase performance is to take a look at analytics and see which queries are made most often and cache those so when a request comes in you serve the cached version and not make a full db query, parsing, etc...

Another naive option would be to have a script run 1-2 times a day running all common queries and create static HTML files that users hit when making particular search queries instead of hitting the db.

于 2013-03-21T00:33:27.837 回答