即使链接中有 https: 也会出现缺少架构错误
我正在尝试使用 Python抓取多个 wiki 页面,我在 excel 中有一个 wiki URL 列表。
并创建了一个 Python 类,用于抓取 Wiki 页面并通过 for 循环运行它。在没有 for 循环的情况下运行代码时,我可以获得输出,但是当我在 for 循环中包含以下代码时,我得到了缺少的模式。
import re
from bs4 import BeautifulSoup
import requests
import xlrd
wb = xlrd.open_workbook('list.xls')
sheet = wb.sheet_by_index(0)
class wiki:
def __init__(self,url):
#self.name =name
self.url = url
cont = requests.get(self.url, timeout=5)
soup = BeautifulSoup(cont.content, "html.parser")
def urlcont (self):
cont = requests.get(self.url, timeout=5)
soup = BeautifulSoup(cont.content, "html.parser")
print (soup.prettify())
def head(self):
cont = requests.get(self.url, timeout=5)
soup = BeautifulSoup(cont.content, "html.parser")
title = soup.find(class_='firstHeading').i.text
return title
for i in range (sheet.nrows):
url = sheet.cell_value(i,2)
print (url)
data = wiki(url)
head = data.head()
print (head)
运行此代码后出错
Traceback (most recent call last):
File "D:\PYTHON\1click\final\alex.py", line 177, in <module>
movie = wikimovie(movieurl)
File "D:\PYTHON\1click\final\alex.py", line 69, in __init__
cont = requests.get(self.url, timeout=5)
File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 519, in request
prep = self.prepare_request(req)
File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 452, in prepare_request
p.prepare(
File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\models.py", line 313, in prepare
self.prepare_url(url, params)
File "C:\Users\acer\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\models.py", line 387, in prepare_url
raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL '': No schema supplied. Perhaps you meant http://?
删除 For 循环时的输出
https://en.wikipedia.org/wiki/######
######
打印所有 url(带有 for 循环)而不调用类的输出
https://en.wikipedia.org/wiki/######
https://en.wikipedia.org/wiki/######
https://en.wikipedia.org/wiki/######
忽略 for 循环时我可以获得的输出并将 var i 替换为此行中的任何其他随机值“url = sheet.cell_value(i,2)”