我写了一个这样的程序:
reader=csv.reader(open("lrgdata.csv"))
headers = reader.next()
Amt_Wtotal=0
Amt_Dtotal=0
dataW =[]
dataD=[]
counts_W=defaultdict(int)
counts_D=defaultdict(int)
for row in reader:
if(row[28]=='W'):
counts_W[row[5]] += 1
Amt_Wtotal += float(row[6])
dataW.append(Amt_Wtotal)
else:
counts_D[row[5]] += 1
Amt_Dtotal += float(row[6])
dataD.append(Amt_Dtotal)
当我使用 412KB 的文件运行此代码时,我没有收到错误,但是当我使用 1.8MB 的文件运行时,我收到此错误:
if(row[28]=='W'): IndexError: list index out of range
我的文件是这样的
标头
personal_info_id_city,personal_info_sex,transaction_master_id_transaction_master,card_holder_info_id_terminal_info,transaction_master_id_terminal_info,account_info_id_account_info,transaction_master_amount,personal_info_dob_m,card_holder_info_card_issue_dt,personal_info_dob_h,transaction_master_transaction_from,personal_info_dob_d,transaction_master_transacted_on,account_info_balance_amt,personal_info_id_user_type,personal_info_dob_y,card_holder_info_card_issue_dt_y,transaction_master_transacted_on_y,transaction_master_transacted_on_d,card_holder_info_card_issue_dt_d,transaction_master_transacted_on_m,card_holder_info_card_issue_dt_h,transaction_master_transacted_on_h,card_holder_info_card_issue_dt_m,transaction_master_id_customer_info,personal_info_dob,card_holder_info_id_brch,card_holder_info_id_card_holder_info,transaction_master_transaction_type,_id,personal_info_id_customer_info
价值观
2,M,17748,60,60,21768,1460.0,7,2011-04-02 00:00:00,0,B,5,2011-07-22 03:03:00,52.0,1,1992,2011,2011,22,2,7,0,3,4,21768,1992-07-05 00:00:00,26,21768,W,50f38a469cf9c253d600000c,21768
1,M,18002,3,3,1746,3480.0,2,2011-04-07 00:00:00,0,B,5,2011-07-25 01:03:00,123.0,1,1985,2011,2011,25,7,7,0,1,4,1746,1985-02-05 00:00:00,3,1746,D,50f38a469cf9c253d600000d,1746
你能告诉我如何找到两个数据集之间的相关性,这是一个列表吗?