python - Python Beautiful Soup 'ascii' 编解码器无法编码字符 u'\xa5'

Question

在网页抓取页面的某些元素时，我遇到了一些奇怪的字符。似乎给出错误的字符是：

? ?? ?????? /?? />? /？？？?/¢¥Á ??%% ?Á ? ？？？？一个？?> /???¥??> ¥? ¥©Á ?>¢¥/%%/¥??> ?Â >Á? 一个？Á ©???¢ ñ%Á?¥???/% Á%Á?¥??>?? />? ???? ??¥?? ??¢¥????¥??> ¢`¢¥Á ¢ ??%% ?Á ??À?/?Á? 日元？_ÁÁ¥ ?>??Á/¢?>À Á???? Á>¥ ?? ??¥Á? />? ??__?>??/¥??>¢ ?Á

我的代码如下

url= "http://www.nsf.gov#######@#@#@##";
    #webbrowser.open(url,new =new );
    flagcnt+=1
    if flagcnt%20==0: #autosleep for avoiding shut-out
        print "flagcount: "
        print flagcnt
        time.sleep(5)
     #Program Code extraction
    r = requests.get (url)
    sp=BeautifulSoup(r.content)

页面：http ://www.nsf.gov/awardsearch

我阅读了有关此错误的所有页面，其中一些建议解码和编码，但它们似乎没有帮助。我不知道这里使用的是哪种编码。尝试降级 BS 版本但没有帮助。任何帮助表示赞赏。蟒蛇 2.7 BS 4

score 12 · Accepted Answer

这对我有用：

page_text = r.text.encode('utf-8').decode('ascii', 'ignore')
page_soupy = BeautifulSoup.BeautifulSoup(page_text)

python - Python Beautiful Soup 'ascii' 编解码器无法编码字符 u'\xa5'

1 回答 1

Related

Reference