1

我想抓取“www.naver.com”,所以我尝试使用开放 api 抓取,我在下面编写了代码:

import urllib.request
import urllib.parse
from bs4 import BeautifulSoup
defaultURL = 'http://openapi.naver.com/search?&'
key = 'key=keyvalue'
target='&target=news'
sort='&sort=sim'
start='&start=1'
display='&display=100'
query='&query='+urllib.parse.quote_plus(str(input("write:")))

fullURL=defaultURL+key+target+sort+start+display+query

print(fullURL)
file=open("C:\\Users\\kimty\\Desktop\\k\\python\\N\\naver_news.txt","w",encoding='utf-8')

f=urllib.request.urlopen(fullURL)
resultXML=f.read()
xmlsoup=BeautifulSoup(resultXML,'html.parser')

items=xmlsoup.find._all('item')

for item in items:
    file.write('---------------------------------------\n')
    file.write('title :'+item.tile.get_text(strip=True)+'\n')
    file.write('contents : '+item.description.get_text(strip=True)+'\n')
    file.write('\n')

file.close()

但python shell只显示这个

============= RESTART: C:\Users\kimty\Desktop\kpython\N\N.py =============
write:lee
http://openapi.naver.com/search?&key=keyvalue&target=news&sort=sim&start=1&display=100&query=lee
Traceback (most recent call last):
  File "C:\Users\kimty\Desktop\k\python\N\N.py", line 19, in <module>
    f=urllib.request.urlopen(fullURL)
  File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python34\lib\urllib\request.py", line 464, in open
    response = self._open(req, data)
  File "C:\Python34\lib\urllib\request.py", line 482, in _open
    '_open', req)
  File "C:\Python34\lib\urllib\request.py", line 442, in _call_chain
    result = func(*args)
  File "C:\Python34\lib\urllib\request.py", line 1211, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "C:\Python34\lib\urllib\request.py", line 1186, in do_open
    r = h.getresponse()
  File "C:\Python34\lib\http\client.py", line 1227, in getresponse
    response.begin()
  File "C:\Python34\lib\http\client.py", line 386, in begin
    version, status, reason = self._read_status()
  File "C:\Python34\lib\http\client.py", line 356, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: ''

为什么会这样?那个蟒蛇壳跟我说话呢?我正在使用 Windows 8.1 64x,python 3.4.4

4

1 回答 1

0

这个 http.client.BadStatusLine 是 http.client.HTTPException 的子类。它给了你一个 http 错误,也许你的 API 密钥是错误的!如果我尝试使用我的浏览器访问该链接,它也会给我一个错误。

是您尝试请求的确切地址。

编辑

有些人通过导入 http 库修复了这个错误。

于 2016-05-17T16:33:57.027 回答