0

我正在尝试使用子进程库从我的 Python 模块运行 grep 命令。因为,我正在对 doc 文件执行此操作,所以我正在使用 Catdoc 第三方库来获取计划文本文件中的内容。我想将内容存储在文件中。我不知道我哪里出错了,但是程序无法生成纯文本文件并最终无法获得 grep 结果。我已经浏览了错误日志,但它是空的。感谢所有的帮助。

def search_file(name, keyword):
    #Extract and save the text from doc file
    catdoc_cmd = ['catdoc', '-w' , name, '>', 'testing.txt']
    catdoc_process = subprocess.Popen(catdoc_cmd, stdout=subprocess.PIPE,stderr=subprocess.PIPE, shell=True)
    output = catdoc_process.communicate()[0]
    grep_cmd = []
    #Search the keyword through the text file
    grep_cmd.extend(['grep', '%s' %keyword , 'testing.txt'])
    print grep_cmd
    p = subprocess.Popen(grep_cmd,stdout=subprocess.PIPE,stderr=subprocess.PIPE, shell=True)
    stdoutdata = p.communicate()[0]
    print stdoutdata
4

2 回答 2

4

On UNIX, specifying shell=True will cause the first argument to be treated as the command to execute, with all subsequent arguments treated as arguments to the shell itself. Thus, the > won't have any effect (since with /bin/sh -c, all arguments after the command are ignored).

Therefore, you should actually use

catdoc_cmd = ['catdoc -w "%s" > testing.txt' % name]

A better solution, though, would probably be to just read the text out of the subprocess' stdout, and process it using re or Python string operations:

catdoc_cmd = ['catdoc', '-w' , name]
catdoc_process = subprocess.Popen(catdoc_cmd, stdout=subprocess.PIPE,stderr=subprocess.PIPE)
for line in catdoc_process.stdout:
    if keyword in line:
        print line.strip()
于 2012-10-07T01:06:57.320 回答
2

I think you're trying to pass the > to the shell, but that's not going to work the way you've done it. If you want to spawn a process, you should arrange for its standard out to be redirected. Fortunately, that's really easy to do; all you have to do is open the file you want the output to go to for writing and pass it to popen using the stdout keyword argument, instead of PIPE, which causes it to be attached to a pipe which you can read with communicate().

于 2012-10-07T01:07:39.447 回答