-6

我在文件中有文本,如下所示:

美联储可能要等到 2014 年初才开始放松刺激措施的一个原因是,政策制定者在此之前根本不知道劳动力市场是在增强还是减弱。直到 12 月,月度就业调查才会摆脱停工的影响,而且该报告要到 1 月初才会发布。

9 月份的就业报告令人失望,经济增加了 148,000 个新工作岗位,而不是预期的 185,000 个,但由于预期美联储的刺激措施将持续到 2014 年,股市上涨。

在另一个文件中,我有替换列表:

1月:2月 9月:11月 每月:每周

我该如何根据替换列表替换文本中的所有单词以替换?

Try this:
with open('t_.txt') as f3:
    with open ('egb.out') as w3:

        for line in f3:
            for line1 in w3:

                word,string = line1.split(':')
                print line.replace(word,string),

但仅适用于第一线

4

2 回答 2

2

将这两个文件读入字符串后,这些行应该可以工作

# text contains the first file
# replacements contains the list of replacement
for w in replacements.split(' '):
    if ':' in w:
        word,replacement = w.split(':')
        text = text.replace(word,replacement)
于 2013-10-23T07:46:15.677 回答
2

使用字典和类似这个字符串的东西(或从文件中读取,或其他):

rep = {'January':'Febryary', 'September':'november', 'monthly':'weekly'}

s = """One reason the Fed is likely to wait until early 2014 to begin easing back on stimulus efforts is that policy makers there simply will not know if the labor market is gaining or losing strength before then. Not until December will the monthly jobs survey be free of the shutdown static, and that report does not come out until early January.

The September jobs report was disappointing, with the economy adding 148,000 new jobs instead of the expected 185,000, but stocks rose on anticipation that Fed stimulus efforts would continue well into 2014."""

然后,您可以使用此单线:

result = reduce(lambda x, y: x.replace(*y), rep.iteritems(), s)

或者使用(在我看来更有效的)正则表达式:

import re

rep = dict((re.escape(k), v) for k, v in rep.iteritems()) # makes sure things wont screw up
pattern = re.compile("|".join(rep.keys())) # create the pattern
result = pattern.sub(lambda m: rep[re.escape(m.group(0))], s)

但实际上,如果你正在处理这样的事情,你应该看看nltk (Natural Language Toolkit)

于 2013-10-23T07:52:53.153 回答