0

所以我有一个列表 L 列表,我正在迭代试图过滤掉重复项。现在我知道这不是最好的方法,但它是特别要求的。我没有得到重复的数据,但我确实得到了我无法修复的重复的空列,有什么帮助吗?

for x in range(len(L), 0, -1): 
x -= 1 #len() 
for y in range(len(L[0]), 0, -1):
    y -= 1 
    if y != 0 and y != 1: #Skiping Coloumns 0 and 1
        check = L[x][y]
        for x0 in range(len(L), 0, -1):
            x0 -= 1 
            for y0 in range(len(L[0]), 0, -1):
                y0 -= 1 
                if y0 == y:
                    checkagainst = L[x0][y0]
                    if check == checkagainst:
                        if x != x0: #If its on the same row, don't count bro
                            #print "Identical Indices:","X0:",x0,",","Y0:", y0,"|" ,"X:",x,",","Y:",y
                            #print L[x][y], "," , L[x0][y0]
                            WriteMe = True #Write to Not Duplicate file or not decider
                            if check == "": ##Didnt work
                                WriteMe = False
        print x, ",", y
if WriteMe == True:
    dwriter.writerow(L[x])
    WriteMe = False #Set to False for next iteration
else:
    writer.writerow(L[x])
L.pop(x)
print

样本输入:

ID, Sex, E-mail

1, M, lol@jk.com

2, F, 

3, F,

4, F, jack@jay.com

预期输出(无重复文件):

Id, Sex, E-mail

1, M, lol@jk.com

2, F,

4, jack@jay.com

(ID 2 和 ID 3 在这种情况下可以互换,因为它们是重复的行)

预期输出(重复文件):

ID, Sex, E-mail

3, F, 
4

1 回答 1

0

您可以使用collections.OrderedDict

>>> from collections import OrderedDict
with open('abc') as f:
    #next(f)       #skip header if present
    for line in f:
        data = map(str.strip, line.split(', '))
        idx, sex, mail = data if len(data) == 3 else data+['']
        dic.setdefault(mail,[]).append([idx,sex])
...     

非重复:

for k,v in dic.iteritems():
    print ", ".join((v[0][0],v[0][1],k))
...     
1, M, lol@jk.com
2, F, 
4, F, jack@jay.com

复制:

for k,v in dic.iteritems():
    if len(v) >1:
        for v1 in v[1:]:
            print ", ".join((v1[0],v1[1],k))
...             
3, F,, 
于 2013-07-11T23:13:53.317 回答