我正在尝试下载数据集中给出的所有图像。
https://www.kaggle.com/crowdflower/twitter-user-gender-classification 检查下载 它是一个 CSV 文件,包含 26 列的 20000 个数据集
我运行了这个脚本
import requests
import pandas as pd
import os
import imageio
from pandas import DataFrame
df=pd.read_csv('E:/gender-classifier-DFE-791531.csv',encoding='latin1')
print(df.shape)
imgURL=df['profileimage']
uniID=df['_unit_id']
gender=df['gender']
dict={'images':[0],'gender':''}
global jk
jk=DataFrame(dict)
def get_images(image_url,ID,gender,i):
print(i)
response=requests.get(image_url,stream=True)
if not response.ok:
print(response)
return
k=imageio.imread(image_url)
k=k.flatten()
dict1={'ID':ID,'images':[k],'gender':gender}
df=pd.DataFrame(dict1)
global jk
jk=pd.concat([jk,df],axis=0)
jk.set_index('ID')
for i in range(187,len(imgURL)+1):
get_images(imgURL[i],uniID[i],gender[i],i)
jk.to_csv('C:\\Users\\prabhu\\Desktop\\jk.csv',sep=',')
但是在运行 150 个数据集(这是 20k 数据集的一部分)后我遇到了问题。
*Traceback (most recent call last):
File "E:/image_extraction.py", line 29, in <module>
get_images(imgURL[i],uniID[i],gender[i],i)
File "E:/image_extraction.py", line 20, in get_images
k=imageio.imread(image_url)
File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\core\functions.py", line 221, in imread
reader = read(uri, format, "i", **kwargs)
File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\core\functions.py", line 143, in get_reader
return format.get_reader(request)
File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\core\format.py", line 174, in get_reader
return self.Reader(self, request)
File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\core\format.py", line 224, in __init__
self._open(**self.request.kwargs.copy())
File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\plugins\pillowmulti.py", line 57, in _open
return PillowFormat.Reader._open(self)
File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\plugins\pillow.py", line 132, in _open
if hasattr(self._im, "n_frames"):
File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\PIL\GifImagePlugin.py", line 96, in n_frames
self.seek(self.tell() + 1)
File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\PIL\GifImagePlugin.py", line 128, in seek
self._seek(f)
File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\PIL\GifImagePlugin.py", line 158, in _seek
self.fp.seek(self.__offset)
File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\core\request.py", line 513, in seek
ori_seek(i, mode)
io.UnsupportedOperation: seek*
需要帮助来解决这个问题。