我正在尝试编写一个脚本来搜索网站源代码中的文本。我有它,所以它成功地抓取了源代码并将其打印出来,看起来像:
b'<?xml version="1.0" encoding="UTF-8" ?>\n<!DOCTYPE html
......等等
但是,当尝试使用 搜索在代码中查找“div”标签时print(page.find('div'))
,我收到一条错误消息,指出TypeError: Type str doesn't support the buffer API
我认为这与我收到一个字节文字有关。如何将其编码为 UTF-8 或 ASCII 以便能够搜索字符串?
如果需要,这是我正在运行的简单代码:
import urllib.request
from urllib.error import URLError
def get_page(url):
#make the request
req = urllib.request.Request(url)
the_page = urllib.request.urlopen(req)
#get the results of the request
try:
#read the page
page = the_page.read()
print(page)
print(page.find('div'))
#except error
except URLError as e:
#if error has a reason (thus is url error) print the reason
if hasattr(e, 'reason'):
print(e.reason)
#if error has a code (thus is html error) print the code and the error
if hasattr(e, 'code'):
print(e.code)
print(e.read())