我试图编写一个脚本来获取谷歌的 ajax 搜索结果(例如:http ://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=filetype:pdf )并下载每个文件。现在我一直在尝试将响应转换为 python 字典,以便更容易通过。
import subprocess
import ast
subprocess.call("curl -G -d 'q=filetype:pdf&v=1.0' http://ajax.googleapis.com/ajax/services/search/web > output",stderr=subprocess.STDOUT,shell=True)
file = open('output','r')
contents = file.read()
output_dict = ast.literal_eval(contents)
print output_dict
当我运行它时,我得到:
$ python script.py
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2643 0 2643 0 0 15926 0 --:--:-- --:--:-- --:--:-- 26696
Traceback (most recent call last):
File "script.py", line 7, in <module>
output_dict = ast.literal_eval(contents)
File "/usr/lib/python2.7/ast.py", line 80, in literal_eval
return _convert(node_or_string)
File "/usr/lib/python2.7/ast.py", line 63, in _convert
in zip(node.keys, node.values))
File "/usr/lib/python2.7/ast.py", line 62, in <genexpr>
return dict((_convert(k), _convert(v)) for k, v
File "/usr/lib/python2.7/ast.py", line 79, in _convert
raise ValueError('malformed string')
ValueError: malformed string
该文件如下所示:
{"responseData": {"results":[{"GsearchResultClass":"GwebSearch",
"unescapedUrl":"http://www.foundationdb.com/AlphaLicenseAgreement.pdf",
"url":"http://www.foundationdb.com/AlphaLicenseAgreement.pdf",
"visibleUrl":"www.foundationdb.com",
"cacheUrl":"http://www.google.com/search?q\u003dcache:W7zhFlfbm6UJ:www.foundationdb.com",
"title":"FoundationDB Alpha Software Evaluation License Agreement",
"titleNoFormatting":"FoundationDB Alpha Software Evaluation License Agreement",
"content":"FOUNDATIONDB. ALPHA SOFTWARE EVALUATION LICENSE AGREEMENT. PLEASE READ CAREFULLY THE TERMS OF THIS ALPHA SOFTWARE \u003cb\u003e...\u003c/b\u003e",
"fileFormat":"PDF/Adobe Acrobat"
},
{"GsearchResultClass":"GwebSearch",
"unescapedUrl":"https://subreg.cz/registration_agreement.pdf",
"url":"https://subreg.cz/registration_agreement.pdf",
"visibleUrl":"subreg.cz",
"cacheUrl":"http://www.google.com/search?q\u003dcache:ODtRmQsiHD0J:subreg.cz",
"title":"Registration Agreement",
"titleNoFormatting":"Registration Agreement",
"content":"Registration Agreement. In order to complete the registration process you must read and agree to be bound by all terms and conditions herein. TERMS AND \u003cb\u003e...\u003c/b\u003e",
"fileFormat":"PDF/Adobe Acrobat"
},
{"GsearchResultClass":"GwebSearch",
"unescapedUrl":"http://supportdetails.com/export.pdf",
"url":"http://supportdetails.com/export.pdf",
"visibleUrl":"supportdetails.com",
"cacheUrl":"http://www.google.com/search?q\u003dcache:h0LvxrTTKzIJ:supportdetails.com",
"title":"Export PDF - Support Details",
"titleNoFormatting":"Export PDF - Support Details",
"content":"",
"fileFormat":"PDF/Adobe Acrobat"
},
{"GsearchResultClass":"GwebSearch",
"unescapedUrl":"http://www.fws.gov/le/pdf/travelpetbird.pdf",
"url":"http://www.fws.gov/le/pdf/travelpetbird.pdf",
"visibleUrl":"www.fws.gov",
"cacheUrl":"",
"title":"pet bird",
"titleNoFormatting":"pet bird",
"content":"U.S. Fish \u0026amp; Wildlife Service. Traveling Abroad with. Your Pet Bird. The Wild Bird Conservation Act (Act), a significant step in international conservation efforts to \u003cb\u003e...\u003c/b\u003e",
"fileFormat":"PDF/Adobe Acrobat"
}],
"cursor":{"resultCount":"72,800,000",
"pages":[{"start":"0","label":1},
{"start":"4","label":2},
{"start":"8","label":3},
{"start":"12","label":4},
{"start":"16","label":5},
{"start":"20","label":6},
{"start":"24","label":7},
{"start":"28","label":8}],
"estimatedResultCount":"72800000",
"currentPageIndex":0,
"moreResultsUrl":"http://www.google.com/search?oe\u003dutf8\u0026ie\u003dutf8\u0026source\u003duds\u0026start\u003d0\u0026hl\u003den\u0026q\u003dfiletype:pdf","searchResultTime":"0.04"
}
},
"responseDetails": null,
"responseStatus": 200
}
花了很长时间才格式化的上帝