python - 复制列而不重复

Question

我需要访问一个 .txt 文件，它有 2 列和很多具有重复名称的行（使用 Python）。我只想复制其中一列而不重复其名称，将其打印在新的 .txt 文件上。我试过：

g = open(file,'r')
linesg = g.readlines()
h = open(file,'w+')
linesh = h.readlines()
for line in range(len(linesg)):
     if linesg[line] in linesh:
        line += 1
     else:
        h.write(linesg[line].split('\t')[1])

但我继续在 .txt 文件中有重复的名称。有人可以帮我吗？（是的，我是 Python 编程的新手）。非常感谢！

score 0 · Accepted Answer

g = open(file,'r')
names = {}
for line in g.readlines():
    name = line.split('\t')[1] #Name is in the second tab
    names[name] = 1 #create a dictionary with the names

#names.keys() returns a list of all the names here
# change the file handle here if needed, or the original file would be overwritten. 
h = open(file,'w+')
for name in names.keys():
    h.write("%s\n"%name)

score 0 · Accepted Answer

sep = '\t'
lines = open('in_file.txt').readlines()
lines_out = []
for line in lines:
    line = line.strip()
    parts = line.split(sep)
    line_out = "%s\n" %(parts[0],) # if only the first column is copied
    if line_out not in lines_out:
        lines_out.append(line_out)

h = open('out_file.txt','w')
h.writelines(lines_out)
h.close()

将其更改为 parts[1] 以复制第二列，..

python - 复制列而不重复

2 回答 2

Related

Reference