"DF","00000000@11111.COM","FLTINT1000130394756","26JUL2010","B2C","6799.2"
"Rail","00000.POO@GMAIL.COM","NR251764697478","24JUN2011","B2C","2025"
"DF","0000650000@YAHOO.COM","NF2513521438550","01JAN2013","B2C","6792"
"Bus","00009.GAURAV@GMAIL.COM","NU27012932319739","26JAN2013","B2C","800"
"Rail","0000.ANU@GMAIL.COM","NR251764697526","24JUN2011","B2C","595"
"Rail","0000MANNU@GMAIL.COM","NR251277005737","29OCT2011","B2C","957"
"Rail","0000PRANNOY0000@GMAIL.COM","NR251297862893","21NOV2011","B2C","212"
"DF","0000PRANNOY0000@YAHOO.CO.IN","NF251327485543","26JUN2011","B2C","17080"
"Rail","0000RAHUL@GMAIL.COM","NR2512012069809","25OCT2012","B2C","5731"
"DF","0000SS0@GMAIL.COM","NF251355775967","10MAY2011","B2C","2000"
"DF","0001HARISH@GMAIL.COM","NF251352240086","22DEC2010","B2C","4006"
"DF","0001HARISH@GMAIL.COM","NF251742087846","12DEC2010","B2C","1000"
"DF","0001HARISH@GMAIL.COM","NF252022031180","09DEC2010","B2C","3439"
"Rail","000AYUSH@GMAIL.COM","NR2151120122283","25JAN2013","B2C","136"
"Rail","000AYUSH@GMAIL.COM","NR2151213260036","28NOV2012","B2C","41"
"Rail","000AYUSH@GMAIL.COM","NR2151313264432","29NOV2012","B2C","96"
"Rail","000AYUSH@GMAIL.COM","NR2151413266728","29NOV2012","B2C","96"
"Rail","000AYUSH@GMAIL.COM","NR2512912359037","08DEC2012","B2C","96"
"Rail","000AYUSH@GMAIL.COM","NR2517612385569","12DEC2012","B2C","96"
以上是样本数据。数据根据电子邮件地址排序,文件非常大,大约 1.5Gb
我想在另一个 csv 文件中输出类似这样的内容
"DF","00000000@11111.COM","FLTINT1000130394756","26JUL2010","B2C","6799.2",1,0 days
"Rail","00000.POO@GMAIL.COM","NR251764697478","24JUN2011","B2C","2025",1,0 days
"DF","0000650000@YAHOO.COM","NF2513521438550","01JAN2013","B2C","6792",1,0 days
"Bus","00009.GAURAV@GMAIL.COM","NU27012932319739","26JAN2013","B2C","800",1,0 days
"Rail","0000.ANU@GMAIL.COM","NR251764697526","24JUN2011","B2C","595",1,0 days
"Rail","0000MANNU@GMAIL.COM","NR251277005737","29OCT2011","B2C","957",1,0 days
"Rail","0000PRANNOY0000@GMAIL.COM","NR251297862893","21NOV2011","B2C","212",1,0 days
"DF","0000PRANNOY0000@YAHOO.CO.IN","NF251327485543","26JUN2011","B2C","17080",1,0 days
"Rail","0000RAHUL@GMAIL.COM","NR2512012069809","25OCT2012","B2C","5731",1,0 days
"DF","0000SS0@GMAIL.COM","NF251355775967","10MAY2011","B2C","2000",1,0 days
"DF","0001HARISH@GMAIL.COM","NF251352240086","09DEC2010","B2C","4006",1,0 days
"DF","0001HARISH@GMAIL.COM","NF251742087846","12DEC2010","B2C","1000",2,3 days
"DF","0001HARISH@GMAIL.COM","NF252022031180","22DEC2010","B2C","3439",3,10 days
"Rail","000AYUSH@GMAIL.COM","NR2151213260036","28NOV2012","B2C","41",1,0 days
"Rail","000AYUSH@GMAIL.COM","NR2151313264432","29NOV2012","B2C","96",2,1 days
"Rail","000AYUSH@GMAIL.COM","NR2151413266728","29NOV2012","B2C","96",3,0 days
"Rail","000AYUSH@GMAIL.COM","NR2512912359037","08DEC2012","B2C","96",4,9 days
"Rail","000AYUSH@GMAIL.COM","NR2512912359037","08DEC2012","B2C","96",5,0 days
"Rail","000AYUSH@GMAIL.COM","NR2517612385569","12DEC2012","B2C","96",6,4 days
"Rail","000AYUSH@GMAIL.COM","NR2517612385569","12DEC2012","B2C","96",7,0 days
"Rail","000AYUSH@GMAIL.COM","NR2151120122283","25JAN2013","B2C","136",8,44 days
"Rail","000AYUSH@GMAIL.COM","NR2151120122283","25JAN2013","B2C","136",9,0 days
即,如果条目第一次出现,我需要附加 1 如果它出现第二次,我需要附加 2,同样我的意思是我需要计算文件中电子邮件地址的出现次数,如果电子邮件存在两次或更多,我想要差异在日期和记住日期之间没有排序,所以我们还必须根据特定的电子邮件地址对它们进行排序,我正在使用 numpy 或 pandas 库或任何其他可以处理这种类型的海量数据而不会放弃的库在 python 中寻找解决方案绑定内存异常我有双核处理器,centos 6.3,内存为 4GB