python - 从python脚本中生成的文本中去除重复的单词

Question

我制作了一个 python 脚本来从输入文件中获取文本，并根据剪切技术 (http://en.wikipedia.org/wiki/Cut-up_technique) 为创意写作项目随机重新排列单词。

这是目前的脚本。注意：我将其作为服务器端包含运行。

#!/usr/bin/python
from random import shuffle 

src = open("input.txt", "r")
srcText = src.read()
src.close()

srcList = srcText.split()
shuffle(srcList)
cutUpText = " ".join(srcList)
print("Content-type: text/html\n\n" + cutUpText)

这基本上完成了我想要它做的工作，但我想做的一项改进是识别输出中的重复单词并将它们删除。为了澄清，我只想识别序列中的重复项，例如“the”或“II I”。我不想这样做，例如，“the”在整个输出中只出现一次。

有人可以指出我开始解决这个问题的正确方向吗？（我的背景根本不是编程，所以我基本上是通过大量阅读python手册和浏览这个网站来整理这个脚本。请对我温柔。）

score 5 · Accepted Answer

您可以编写一个生成器来生成没有重复的单词：

def nodups(s):
    last = None
    for w in s:
        if w == last:
            continue
        yield w
        last = w

然后你可以在你的程序中使用它：

cutUpText = " ".join(nodups(srcList))

score 1 · Accepted Answer

添加行

spaces = [(i%10) == 9 and '\n' or ' ' for i in range(0,len(srcList))];
cutUpText = "".join(map(lambda x,y: "".join([x,y]),srcList,spaces));

有助于为文本屏幕带来一些原始格式。

score 0 · Accepted Answer

0

将此添加到您现有的程序中：

srcList = list(set(srcText.split()))

于 2012-12-20T10:20:00.123 回答

python - 从python脚本中生成的文本中去除重复的单词

3 回答 3

Related

Reference