我正在尝试从如下所示的数据中删除重复条目:
name phone email website
Diane Grant Albrecht M.S.
Lannister G. Cersei M.A.T., CEP 111-222-3333 cersei@got.com www.got.com
Argle D. Bargle Ed.M.
Sam D. Man Ed.M. 000-000-1111 dman123@gmail.com www.daManWithThePlan.com
Sam D. Man Ed.M.
Sam D. Man Ed.M. 111-222-333 dman123@gmail.com www.daManWithThePlan.com
D G Bamf M.S.
Amy Tramy Lamy Ph.D.
所以它看起来像这样:
name phone email website
Diane Grant Albrecht M.S.
Lannister G. Cersei M.A.T., CEP 111-222-3333 cersei@got.com www.got.com
Argle D. Bargle Ed.M.
Sam D. Man Ed.M. 000-000-1111, 111-222-333 dman123@gmail.com www.daManWithThePlan.com
D G Bamf M.S.
Amy Tramy Lamy Ph.D.
这是我的代码:
from collections import defaultdict
import csv
import re
input = open('ieca_first_col_fake_text.txt', 'rU')
# default to empty set for phone, email, website, area, degrees
extracted_data = defaultdict(lambda: [set(), set(), set()])
for row in input:
for index, value in enumerate(row):
name = row[0]
data = extracted_data[name].add(row)
for row in data: print row
我收到此错误:
AttributeError: 'list' object has no attribute 'add'
logout
更新:
from collections import defaultdict
import csv
import re
input = open('ieca_first_col_fake_text.txt', 'rU')
input_r = csv.reader(input, delimiter = '\t')
# default to empty set for phone, email, website, area, degrees
extracted_data = defaultdict(lambda: [set(), set(), set()])
data = []
# Index on the name and then for that name add the rest of the information.
for row in input_r:
data_set = extracted_data[row[0]]
for index, value in enumerate(row[1:]):
data_set[index].add(value)
print data_set
输出:
[set(['']), set(['']), set([''])]
logout