ruby - 用于在 google 上打印总结果数的 Ruby 脚本

Question

我想制作 ruby 脚本以在使用类似查询进行搜索时在 google 上打印总结果数allinurl: http://www.example.net/Downloads.aspx?Doc=

我浏览了页面的源代码并制作了以下 ruby 脚本

require "rubygems"
require "rest-client"

url="https://www.google.com.np/search?q=allinurl:+http://www.dpsmathuraroad.net/Downloads.aspx%3FDoc%3D&lr=&safe=active&hl=en&noj=1&biw=1366&bih=643&filter=0"
intel=RestClient.get(url)

xfile=File.open("dpsmathuraroad.txt","w")
xfile.write(intel.body)
xfile.close

xfile2=File.open("dpsmathuraroad.txt", "r")
while !xfile2.eof?
    ch=xfile2.readline
    if ch=~ /<div id="resultStats">About /
        break
    end
end
dat=ch.split(/[<div id="sbfrm_l"><div id="resultStats">About , results<nobr> ]/)
puts dat[1]
gets

上面代码中的行dat=ch.split(/[<div id="sbfrm_l"><div id="resultStats">About , results<nobr> ]/)是对页面源代码的纯操作。

但不幸的是，谷歌确实面临人类挑战，因此验证码会干扰。

如何通过干扰验证码并使用此类 ruby 脚本获得所需的结果？可以使用一些 API 来完成吗？

score 1 · Accepted Answer

你不能。这正是验证码存在的原因。任何形式的抓取都违反了谷歌的服务条款，他们使用验证码来强制执行。

对不起。

score 1 · Accepted Answer

如果您不介意违反他们的服务条款，可以使用 API 来解决 Captcha。这些通常用于结果刮板，例如Serposcope。

例如反验证码。

ruby - 用于在 google 上打印总结果数的 Ruby 脚本

2 回答 2

Related

Reference