python - Python：如何将输出捕获到文本文件？（现在只捕获了 530 行中的 25 行）

Question

我已经在 SO 上进行了大量的潜伏，并进行了大量的搜索和阅读，但我也必须承认在一般编程方面是一个相对的菜鸟。我正在努力学习，所以我一直在玩 Python 的 NLTK。在下面的脚本中，我可以让一切正常工作，除了它只写多屏输出的第一个屏幕，至少我是这么想的。

这是脚本：

#! /usr/bin/env python

import nltk

# First we have to open and read the file:

thefile = open('all_no_id.txt')
raw = thefile.read()

# Second we have to process it with nltk functions to do what we want

tokens = nltk.wordpunct_tokenize(raw)
text = nltk.Text(tokens)

# Now we can actually do stuff with it:

concord = text.concordance("cultural")

# Now to save this to a file

fileconcord = open('ccord-cultural.txt', 'w')
fileconcord.writelines(concord)
fileconcord.close()

这是输出文件的开头：

Building index...
Displaying 25 of 530 matches:
y .   The Baobab Tree : Stories of Cultural Continuity The continuity evident 
 regardless of ethnicity , and the cultural legacy of Africa as well . This Af

为了将整个 530 个匹配项写入文件，我在这里缺少什么？

score 5 · Accepted Answer

text.concordance(self, word, width=79, lines=25)根据手册似乎有其他参数。

我看不到提取索引大小的方法，但是，索引打印代码似乎有这部分：lines = min(lines, len(offsets))，因此您可以简单地sys.maxint作为最后一个参数传递：

concord = text.concordance("cultural", 75, sys.maxint)

添加：

现在看你的原始代码，我看不到它以前可以工作的方式。text.concordance不返回任何内容，而是将所有内容输出到stdoutusing print。因此，简单的选择是将标准输出重定向到您的文件，如下所示：

import sys

....

# Open the file
fileconcord = open('ccord-cultural.txt', 'w')
# Save old stdout stream
tmpout = sys.stdout
# Redirect all "print" calls to that file
sys.stdout = fileconcord
# Init the method
text.concordance("cultural", 200, sys.maxint)
# Close file
fileconcord.close()
# Reset stdout in case you need something else to print
sys.stdout = tmpout

另一种选择是直接使用相应的类并省略 Text 包装器。只需从此处复制位并将它们与此处的位结合即可完成。

score 2 · Accepted Answer

更新：

我发现这个write text.concordance output to a file Options from the ntlk usergroup。它是从 2010 年开始的，并指出：

Text 类的文档说：“旨在支持对文本的初步探索（通过交互式控制台）。......如果你想编写一个利用这些分析的程序，那么你应该绕过 Text 类，并使用而是直接使用适当的分析函数或类。”

如果从那以后包中没有任何变化，这可能是您的问题的根源。

- - 之前 - -

我没有看到使用writelines()写入文件的问题：

file.writelines（序列）

将一系列字符串写入文件。序列可以是任何产生字符串的可迭代对象，通常是字符串列表。没有返回值。（该名称旨在匹配 readlines()； writelines() 不添加行分隔符。）

注意斜体部分，您是否在不同的编辑器中检查了输出文件？也许数据在那里，但由于缺少行尾分隔符而无法正确呈现？

您确定这部分正在生成您要输出的数据吗？

 concord = text.concordance("cultural")

我不熟悉nltk，所以我只是作为消除问题可能来源的一部分询问。

python - Python：如何将输出捕获到文本文件？（现在只捕获了 530 行中的 25 行）

2 回答 2

Related

Reference