0

我在 python 中创建一个必须使用 grep 的代码,我在通过 grep 运行它时遇到问题。我从“Infile”开始,然后对该文件进行剪切和排序以创建“Infile.ids”。“Infile.ids”包含“Infile”中的唯一 ID。然后,我必须通过“Infile”逐行运行“Infile.ids”中的 id,并将所有带有 id 的行提取到新的单独文件中。问题是当我在 grep 中运行它时,它一次运行所有行,并且基本上给了我一堆与原始“Infile”相同的文件,而不是单独的唯一文件。

这些是我试图获取的示例“Infile”和输出文件。

Infile              Infile.ids    Infile.Hello     Infile.World      Infile.Adios
Hello 1 3 5 7       Hello         Hello 1 3 5 7    World 2 4 6 8     Adios 1 2 3 4
World 2 4 6 8       World         Hello a b c d    World e f g h     Adios i j k l
Adios 1 2 3 4       Adios
Hello a b c d
World e f g h
Adios i j k l

这是我到目前为止的代码:

#!/usr/bin/python

import sys
import os

Infile = sys.argv[1]

os.system("cut -d \" \" -f1 %s | sort -u > %s.ids" % (Infile, Infile))
Infile2 = "%s.ids" % Infile

handle = open("%s.ids" % Infile, "r")
line = handle.readline()

for line in handle:
    os.system("grep \"%s\" %s > %s.%s" % (line, Infile, Infile, line))
    line = handle.readline()

handle.close()
4

1 回答 1

0

When you iterate over handle, every line will have a newline at the end, which the lines in the Infile obviously don't (they have the "1 3 5 7" stuff first). So that is why your grep is failing.

Try doing

for line in handle.readlines():
    line = line.strip()
    os.system("grep \"%s\" %s > %s.%s" % (line, Infile, Infile, line))

And remove both the line = handle.readline() statements - if you are doing a for loop, it will iterate over the read lines itself. If you want to use the explicit reading call, then a while loop would be more appropriate (though I doubt recommended in this case).

Cheers

于 2013-04-08T10:49:06.730 回答