0

我在尝试通过 pandas 运行我的 python 脚本时遇到以下错误,在运行 30 万条记录数据时,请告知出了什么问题

回溯(最后一次调用):文件“extractyooochoose2.py”,第 32 行,totalitems=[len(x) for x in clicksdat.groupby('Sid')['itemid'].unique()] 文件“” ,第 13 行,在唯一文件“/home/ubuntu/anaconda2/lib/python2.7/site-packages/pandas/core/groupby.py”中,第 620 行,在包装器中引发 ValueError

数据和代码如下图

import pandas as pd
import datetime as dt
clickspath='/tmp/gensim/yoochoose/yoochoose-clicks.dat'
buyspath='/tmp/gensim/yoochoose/yoochoose-buys.dat'
clicksdat=pd.read_csv(clickspath,header=None,dtype={'itemid': pd.np.str_,'Sid':pd.np.str_,'Timestamp':pd.np.str_,'itemcategory':pd.np.str_})
clicksdat.columns=['Sid','Timestamp','itemid','itemcategory']
buysdat=pd.read_csv(buyspath,header=None)
buysdat.columns=['Sid','Timestamp','itemid','price','qty']
segment={}
for i in range(24):
    if i<7:
        segment[i]='EM'
    elif i<10:
        segment[i]='M'
    elif i<13:
        segment[i]='A'
    elif i<18:
        segment[i]='E'
    elif i<23:
        segment[i]='N'
    elif i<25:
        segment[i]='MN'
#*******************************************
buyersession=buysdat.Sid.unique()
clickersession=clicksdat.Sid.unique()
maxtemp=[(dt.datetime.strptime(x,"%Y-%m-%dT%H:%M:%S.%fZ"))  for x in  clicksdat.groupby('Sid')['Timestamp'].max()]
mintemp=[dt.datetime.strptime(x,"%Y-%m-%dT%H:%M:%S.%fZ")  for x in  clicksdat.groupby('Sid')['Timestamp'].min()]
duration=[int((a-b).total_seconds()) for a,b  in zip(maxtemp,mintemp)]
day=[x.day for x in maxtemp]
month=[x.month for x in maxtemp]
noofnavigations=[clicksdat.groupby('Sid').count().Timestamp][0]
totalitems=[len(x) for x in clicksdat.groupby('Sid')['itemid'].unique()]
totalcats=[len(x) for x in clicksdat.groupby('Sid')['itemcategory'].unique()]
timesegment= [segment[x.hour]for x in maxtemp]
segmentchange=[1 if (segment[x.hour]!=segment[y.hour]) else 0 for x,y in zip(maxtemp,mintemp)]
purchased=[x in buyersession for x in noofnavigations.index.values ]
percentile_list = pd.DataFrame({'purchased' : purchased,'duration':duration,'day':day,'month':month,'noofnavigations':noofnavigations,'totalitems':totalitems,'totalcats':totalcats,'timesegment':timesegment,'segmentchange':segmentchange  })
percentile_list.to_csv('/tmp/gensim/yoochoose/yoochoose-clicks1001.csv')

样本数据如下图

sessioid,timestamp,itemid,category  
1,2014-04-07T10:51:09.277Z,214536502,0  
1,2014-04-07T10:54:09.868Z,214536500,0  
1,2014-04-07T10:54:46.998Z,214536506,0  
1,2014-04-07T10:57:00.306Z,214577561,0  
2,2014-04-07T13:56:37.614Z,214662742,0  
2,2014-04-07T13:57:19.373Z,214662742,0  
2,2014-04-07T13:58:37.446Z,214825110,0  
2,2014-04-07T13:59:50.710Z,214757390,0  
4

0 回答 0