4

我正在尝试下载数据集中给出的所有图像。

https://www.kaggle.com/crowdflower/twitter-user-gender-classification 检查下载 它是一个 CSV 文件,包含 26 列的 20000 个数据集

我运行了这个脚本

    import requests
    import pandas as pd
    import os
    import imageio
    from pandas import DataFrame
    df=pd.read_csv('E:/gender-classifier-DFE-791531.csv',encoding='latin1')
    print(df.shape)
    imgURL=df['profileimage']
    uniID=df['_unit_id']
    gender=df['gender']
    dict={'images':[0],'gender':''}
    global jk
    jk=DataFrame(dict)
    def get_images(image_url,ID,gender,i):
        print(i)
        response=requests.get(image_url,stream=True)
        if not response.ok:
            print(response)
            return
        k=imageio.imread(image_url)
        k=k.flatten()
        dict1={'ID':ID,'images':[k],'gender':gender}
        df=pd.DataFrame(dict1)
        global jk
        jk=pd.concat([jk,df],axis=0)
        jk.set_index('ID')


for i in  range(187,len(imgURL)+1):
    get_images(imgURL[i],uniID[i],gender[i],i)
jk.to_csv('C:\\Users\\prabhu\\Desktop\\jk.csv',sep=',')

但是在运行 150 个数据集(这是 20k 数据集的一部分)后我遇到了问题。

*Traceback (most recent call last):
  File "E:/image_extraction.py", line 29, in <module>
    get_images(imgURL[i],uniID[i],gender[i],i)
  File "E:/image_extraction.py", line 20, in get_images
    k=imageio.imread(image_url)
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\core\functions.py", line 221, in imread
    reader = read(uri, format, "i", **kwargs)
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\core\functions.py", line 143, in get_reader
    return format.get_reader(request)
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\core\format.py", line 174, in get_reader
    return self.Reader(self, request)
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\core\format.py", line 224, in __init__
    self._open(**self.request.kwargs.copy())
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\plugins\pillowmulti.py", line 57, in _open
    return PillowFormat.Reader._open(self)
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\plugins\pillow.py", line 132, in _open
    if hasattr(self._im, "n_frames"):
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\PIL\GifImagePlugin.py", line 96, in n_frames
    self.seek(self.tell() + 1)
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\PIL\GifImagePlugin.py", line 128, in seek
    self._seek(f)
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\PIL\GifImagePlugin.py", line 158, in _seek
    self.fp.seek(self.__offset)
  File "C:\Users\prabhu\AppData\Local\Programs\Python\Python37\lib\site-packages\imageio\core\request.py", line 513, in seek
    ori_seek(i, mode)
io.UnsupportedOperation: seek*

需要帮助来解决这个问题。

4

0 回答 0