python - 调用外部 egrep 和 less 时子进程非常慢

Question

我正在尝试构建一个 python 脚本，该脚本将允许我在egrep -v属性上动态构建并将输出通过管道传输到更少（或更多）。
我想使用外部 egrep+less 的原因是因为我正在处理的文件是非常大的文本文件（500MB+）。首先将它们读入列表并通过 Python 本地处理所有内容非常慢。

但是，当我使用 os.system 或 subprocess.call 时，现在一切都很慢，我想退出较少的输出并返回到 python 代码。

我的代码应该像这样工作：
1. ./myless.py messages_500MB.txt
2. 显示较少 -FRX 的 messages_500MB.txt 输出（完整文件）。
3.当我按'q'退出less -FRX时，python代码应该接管并显示提示用户输入要排除的文本。用户输入它，我将其添加到列表中
4. 我的 python 代码构建 egrep -v 'exclude1' 并将输出通过管道传输到更少
5. 用户重复步骤 3 并输入另一个要排除的内容
6. 现在我的 python 代码调用 egrep -v 'exclude1|exclude2' messages_500MB.txt | less -FRX
7. 并且该过程继续

但是，这并没有按预期工作。
* 在我的 Mac 上，当用户按 q 退出 less -FRX 时，显示 raw_input 提示需要几秒钟
* 在 Linux 机器上，我收到大量的“egrep：写入输出：损坏的管道”
* 如果，（仅限 Linux）在 less -FRX 中，我按 CTRL+C，由于某种原因退出 less -FRX 会变得更快（如预期的那样）。在 Mac 上，我的 python 程序中断

这是我的代码示例：

excluded = list()
myInput = ''
while myInput != 'q':
    grepText = '|'.join(excluded)
    if grepText == '':
        command = 'egrep "" ' + file + ' | less -FRX'
    else:
        command = 'egrep -v "' + grepText + '" ' + file + ' | less -FRX'

    subprocess.call(command, shell=True)
    myInput = raw_input('Enter text to exclude, q to exit, # to see what is excluded: ')
    excluded.append(myInput)

任何帮助将非常感激

score 2 · Accepted Answer

实际上

我弄清楚了问题是什么'xyz' 文件 | 更少，当我退出时，子进程仍然继续运行 egrep 并且在大文件（500MB+）上这需要一段时间。显然， subprocess 分别采用两个程序并运行第一个程序（egrep），即使在第二个程序（less）退出后为了正确解决我的问题，我使用了这样的东西：

command = 'egrep -v "something" <filename>'
cmd2 = ('less', '-FRX') 
egrep = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
subprocess.check_call(cmd2, stdin=egrep.stdout)
egrep.terminate()

通过将第一个进程输出到第二个进程 stdin，我现在可以在退出 less 时立即终止 egrep，现在我的 python 脚本正在运行 :)

干杯，
米洛斯

python - 调用外部 egrep 和 less 时子进程非常慢

1 回答 1

Related

Reference