python - 如何使用一个文本文件中的字符串来搜索另一个，并创建一个包含另一个列的新文本文件？

Question

我需要使用一个文本文件中的字符串来搜索另一个，每次字符串在第二个文本文件中匹配时，在第二个字符串中搜索一个单词word，如果匹配，则创建第三个文本文件，其中包含第二个文本文件中的特定列文本文件，并对第一个文本文件中的每个字符串重复。

例子

文本文件 1：

10.2.1.1
10.2.1.2
10.2.1.3

文本文件 2：

IP=10.2.1.4 word=apple thing=car name=joe
IP=10.2.1.3 word=apple thing=car name=joe
IP=10.2.1.1 word=apple thing=car name=joe
IP=10.2.1.2 word=apple thing=car name=joe
IP=10.2.1.1 word=apple thing=car name=joe
IP=10.2.1.3 word=apple thing=car name=joe

结果应该是三个单独的文本文件（以文本文件中的字符串命名），每个字符串包含第三列：

结果：10.2.1.3.txt

thing=car
thing=car

等等

到目前为止，我的代码如下所示：

with open(file_1) as list_file:
    for string in (line.strip() for line in list_file):
        if string in file_2:
            if "word" in file_2:            
                column2 = line.split()[2]
                x = open(line+".txt", "a")
                with x as new_file:
                    new_file.write(column2)

我的问题是：这段代码是最好的方法吗？我觉得好像缺少一条重要的“捷径”。

Olafur Osvaldsson的最终代码：

for line_1 in open(file_1):
    with open(line_1+'.txt', 'a') as my_file:
        for line_2 in open(file_2):
            line_2_split = line_2.split(' ')
            if "word" in line_2:
                if "word 2" in line_2:
                    my_file.write(line_2_split[2] + '\n')

score 1 · Accepted Answer

# define files
file1 = "file1.txt"
file2 = "file2.txt"

ip_patterns = set() # I assume that all patterns fits the memory

# filling ip_patterns
with open(file1) as fp:
    for line in fp: 
        ip_patterns.add(line.strip()) # adding pattern to the set


word_to_match = "apple" # pattern for the "word" field
wanted_fields = ['name', 'thing'] # fields to write

with open(file2) as fp:
    for line in fp:
        values = dict(map(lambda x: x.split('='), line.split()))
        if values['IP'] in ip_patterns and values['word'] == word_to_match:
            out = open(values['IP'] + '.txt', 'a')
            for k in wanted_fields:
                out.write("%s=%s\n" % (k, values[k])) # writing to file
            out.close()

score 1 · Accepted Answer

我相信以下代码可以满足您的要求：

file_1='file1.txt'
file_2='file2.txt'

my_string = 'word'

for line_1 in [l.rstrip() for l in open(file_1)]:
    with open(line_1+'.txt', 'a') as my_file:
        for line_2 in open(file_2):
            line_2_split = line_2.split(' ')
            if line_1 == line_2_split[0][3:]:
                if my_string in line_2:
                    my_file.write(line_2_split[2] + '\n')

如果您打算在行中使用最后一个参数，请file_2确保从末尾剥离换行符，就像对第一个文件所做的那样rstrip()，我将它留在file_2.

score 1 · Accepted Answer

这是一个示例，输入文件位于file1.txt和file2.txt中。我将文件 1 的内容及其关联的输出文件句柄缓存在字典“文件”中，然后在主循环结束后将其关闭。

在主循环中，我读入file2.txt的每一行，将其剥离，并使用split方法在空格上对其进行标记。然后我从第一个令牌中找到 IP 地址，并检查它是否在“文件”中。如果是这样，我将第三列写入相应的输出文件。

最后一个循环关闭输出文件句柄。

with open('file1.txt') as file1:
    files = {ip:open(ip + '.txt', 'w') for ip in [line.strip() for line in file1]}

with open('file2.txt') as file2:
    for line in file2:
        tokens = line.strip().split(' ')
        ip = tokens[0][3:]
        if ip in files:
            files[ip].write(tokens[2])
            files[ip].write('\r\n')

for f in files.values():
    f.close()

python - 如何使用一个文本文件中的字符串来搜索另一个，并创建一个包含另一个列的新文本文件？

3 回答 3

Related

Reference