0

我需要访问一个 .txt 文件,它有 2 列和很多具有重复名称的行(使用 Python)。我只想复制其中一列而不重复其名称,将其打印在新的 .txt 文件上。我试过:

g = open(file,'r')
linesg = g.readlines()
h = open(file,'w+')
linesh = h.readlines()
for line in range(len(linesg)):
     if linesg[line] in linesh:
        line += 1
     else:
        h.write(linesg[line].split('\t')[1])

但我继续在 .txt 文件中有重复的名称。有人可以帮我吗?(是的,我是 Python 编程的新手)。非常感谢!

4

2 回答 2

0
g = open(file,'r')
names = {}
for line in g.readlines():
    name = line.split('\t')[1] #Name is in the second tab
    names[name] = 1 #create a dictionary with the names

#names.keys() returns a list of all the names here
# change the file handle here if needed, or the original file would be overwritten. 
h = open(file,'w+')
for name in names.keys():
    h.write("%s\n"%name)
于 2013-05-13T21:17:58.037 回答
0
sep = '\t'
lines = open('in_file.txt').readlines()
lines_out = []
for line in lines:
    line = line.strip()
    parts = line.split(sep)
    line_out = "%s\n" %(parts[0],) # if only the first column is copied
    if line_out not in lines_out:
        lines_out.append(line_out)

h = open('out_file.txt','w')
h.writelines(lines_out)
h.close()

将其更改为 parts[1] 以复制第二列,..

于 2013-05-13T21:42:49.110 回答