python - Python：使用文件a中的值来搜索另一个文件中的行

Question

新手问题

我有 2 个文件文件 A：包含项目列表的文件（苹果、梨、橙子）文件 B：包含世界上所有水果的文件（1,000,000 行）

在 unix 中，我会从文件 B grep apple 并返回所有结果

在 unix 中，我会 1. 从文件 b >>fruitfound.txt grep apple 2.从文件 b >>fruitfound.txt grep pears 3.从文件 b >>fruitfound.txt grep oranges

我想要一个 python 脚本，它使用文件 a 和搜索文件 b 中的值，然后写出输出。注意：文件 B 会有青苹果、红苹果、黄苹果，我想将所有 3 个结果写入fruitfound.txt

最亲切的问候

康提

score 1 · Accepted Answer

1

grep -f $patterns $filename正是这样做的。无需使用 python 脚本。

于 2012-12-25T19:44:10.060 回答

score 0 · Accepted Answer

要在 Python 中查找包含任何给定关键字的行，您可以使用正则表达式：

import re
from itertools import ifilter

def fgrep(words, lines):
    # note: allow a partial match e.g., 'b c' matches 'ab cd'
    return ifilter(re.compile("|".join(map(re.escape, words))).search, lines)

要将其转换为命令行脚本：

import sys

def main():
    with open(sys.argv[1]) as kwfile: # read keywords from given file
        # one keyword per line
        keywords = [line.strip() for line in kwfile if line.strip()]

    if not keywords:
       sys.exit("no keywords are given")

    if len(sys.argv) > 2: # read lines to match from given file
        with open(sys.argv[2]) as file:
            sys.stdout.writelines(fgrep(keywords, file))
    else: # read lines from stdin
        sys.stdout.writelines(fgrep(keywords, sys.stdin))

main()

例子：

$ python fgrep.py a b > fruitfound.txt

有更有效的算法，例如Ago-Corasick 算法，但在我的机器上过滤数百万行只需要不到一秒钟的时间，它可能已经足够好了（grep快几倍）。令人惊讶acora的是，对于我尝试过的数据，基于 Ago-Corasick 算法的速度较慢。

python - Python：使用文件a中的值来搜索另一个文件中的行

2 回答 2

Related

Reference