我有一个文本文件,如下所示
First col, Second col, Third col, Fourth col,...
与此类似:
Johnny, Rodgers, ID1, 18th July,...
Johnny, Rodgers, ID1, 18th July,...
Pat, Bryant, ID2, 29th April,...
Pat, Bryant, ID2, 9th May,...
Jim, Williams, ID3, 10th March,...
Jim, Williams, ID3, 17th March,...
Jim, Williams, ID3, 21st March,...
etc
我想检查第 3 列中是否有重复,在这种情况下,检查第 4 列是否在第 3 列中重复的行中也相同。如果第 3 列和第 4 列也相同,则删除两行(整行),如果第 4 列不同,则存储结果。之后打印/存储结果。
也就是说,
* 如果第 1 行和第 2 行在第 3 列中具有相同的值,并且在第 4 行中也具有相同的值,则删除这两行
* 如果第 3 和第 4 行在第 3 列中具有相同的值而在第 4 行中具有不同的值,则打印行数+1
* 如果第 5、6 和 7 行在 col 3 中具有相同的值而在第 4 行具有不同的值,则打印 rows and count +1
这样执行后,结果就像
Pat, Bryant, ID2, 29th April,...
Pat, Bryant, ID2, 9th May,...
Jim, Williams, ID3, 10th March,...
Jim, Williams, ID3, 17th March,...
Jim, Williams, ID3, 21st March,...
counter = 2 #Number of different ID present
我的想法是制作两个列表并在那里存储行,但我没有成功设置目标并同时比较其他列。我还需要用我当前的逻辑循环和弹出,但我做得不好。
val = []
duplicated = []
with open('file.txt', 'rt') as myf.
for line in myf:
col = line.stip():split(',')
if col[2] not in val:
val.append( THE ROW HERE ) #How to copy and parse the row?
else:
duplicated.append( THE ROW HERE ) #Same question
#Comparisons
for x in value:
if x in dupl:
value.pop(x)
dupl.pop(x)
counter = len(val) #Counter of total cases not erased
val.extend(duplicated)
### I would like to print the whole set of rows ordered by the 3rd col
for element in val:
print element
print "counter of cases: " , counter
改进我的编码的帮助和建议将非常受欢迎。