0

我想对以下列表进行重复数据删除,但还要保留一个重复列表以显示在以下屏幕上。这是从 CSV 文件中提取的,因此最好向用户显示已添加的内容和未添加的内容“Dupes”等。

[
  ['first_name', 'last_name', 'email'],
  ['Danny', 'Lastnme', 'name@email.com'],
  ['Sally', 'Surname', 'name@email.com'],
  ['Sally', 'Surname', 'name@email.com'],  < -- Dupe
  ['Sally', 'Surname', 'name@email.com'],  < -- Dupe
  ['Chris', 'Lastnam', 'name@email.com'],
  ['Larry', 'Seconds', 'name@email.com'],
  ['Barry', 'Barrins', 'name@email.com'],
  ['Glenn', 'Melting', 'name@email.com'],
  ['Glenn', 'Melting', 'name@email.com'],  < -- Dupe
]

最终结果将是生成两个列表,一个很好的重复数据删除结果,另一个是重复列表。

独特:

[
  ['first_name', 'last_name', 'email'],
  ['Danny', 'Lastnme', 'name@email.com'],
  ['Sally', 'Surname', 'name@email.com'],
  ['Chris', 'Lastnam', 'name@email.com'],
  ['Larry', 'Seconds', 'name@email.com'],
  ['Barry', 'Barrins', 'name@email.com'],
  ['Glenn', 'Melting', 'name@email.com'],
]

骗子:

[
  ['Sally', 'Surname', 'name@email.com'],
  ['Sally', 'Surname', 'name@email.com'],
  ['Glenn', 'Melting', 'name@email.com'],
]
4

3 回答 3

1

您可以复制并粘贴此代码以获取欺骗和唯一的返回字典:

a = [
['first_name', 'last_name', 'email'],
['Danny', 'Lastnme', 'name@email.com'],
['Sally', 'Surname', 'name@email.com'],
['Sally', 'Surname', 'name@email.com'],  
['Sally', 'Surname', 'name@email.com'], 
['Chris', 'Lastnam', 'name@email.com'],
['Larry', 'Seconds', 'name@email.com'],
['Barry', 'Barrins', 'name@email.com'],
['Glenn', 'Melting', 'name@email.com'],
['Glenn', 'Melting', 'name@email.com'],
]

result = {}

b = [tuple(x) for x in a[1:]]
all_uniques = set(b)
result['unique'] = [list(x) for x in list(all_uniques)]

# To show which ones have duplicates use Mr Es solution:

from collections import Counter

t = Counter(b)
dupes = []

for k, v in t.iteritems():
    if v > 1:
        dupes.append(list(k)*(v-1))

result['dupes'] = dupes

print(result)
于 2013-10-01T10:30:56.070 回答
1

试试这个。这是最简单的方法。

name_list = [
    ['first_name', 'last_name', 'email'],
    ['Danny', 'Lastnme', 'name@email.com'],
    ['Sally', 'Surname', 'name@email.com'],
    ['Sally', 'Surname', 'name@email.com'],  
    ['Sally', 'Surname', 'name@email.com'], 
    ['Chris', 'Lastnam', 'name@email.com'],
    ['Larry', 'Seconds', 'name@email.com'],
    ['Barry', 'Barrins', 'name@email.com'],
    ['Glenn', 'Melting', 'name@email.com'],
    ['Glenn', 'Melting', 'name@email.com'],
]
sorted_name_list = sorted(name_list[1:])
last_record  = False
Unique = []
Dupes = []
for record in sorted_name_list:
    if last_record != record:
        Unique.append(record)
    else:
        Dupes.append(record)
        last_record = record
print Unique
print Dupes
于 2013-10-01T11:10:28.363 回答
0

你可以得到频率

from collections import Counter

t = Counter(tuple(x) for x in data[1:])

uniques = [list(k) for k, v in t.iteritems() if v == 1]
dupes = [list(k) * (v-1) for k, v in t.iteritems() if v > 1]
于 2013-10-01T10:23:59.973 回答