2

我想使用 python 进行大量查找和替换。

tot11.txt是一个字符串(有 600000 个项目),我想从文件中替换这里的项目1.txt

所以例如tot11.txt有:

'alba', 'raim',

1.txt看起来像这样:

'alba':'barba', 'raim':'uva'.

结果我会得到'barba',,'uva'等等......

当我运行脚本时,我收到以下错误:

Traceback (most recent call last):
  File "sort2.py", line 12, in <module>
    txt = replace_all(my_text, dic)
  File "sort2.py", line 4, in replace_all
    for i, j in dic.iteritems():
AttributeError: 'str' object has no attribute 'iteritems'

如果我不使用文本文件,脚本也可以很好地工作,只是在脚本中编写可更改的项目。

import sys

def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = text.replace(i, j)
    return text

my_text= open('tot11.txt', 'r').read()

reps = open('1.txt', 'r').read()

txt = replace_all(my_text, reps)

f = open('results.txt', 'w')
sys.stdout = f
print txt
4

4 回答 4

6

open('1.txt', 'r').read()返回一个字符串而不是字典。

>>> print file.read.__doc__
read([size]) -> read at most size bytes, returned as a string.

如果1.txt包含:

'alba':'barba', 'raim':'uva'

然后你可以ast.literal_eval用来得到一个字典:

>>> from ast import literal_eval
>>> with open("1.txt") as f:
       dic = literal_eval('{' + f.read() +'}')
       print dic
...     
{'alba': 'barba', 'raim': 'uva'}

而不是使用str.replace你应该使用regex, asstr.replace('alba','barba')也将替换像'albaa','balba'等词:

import re
def replace_all(text, dic):
    for i, j in dic.iteritems():
        text = re.sub(r"'{}'".format(i), "'{}'".format(j), text)
    return text
于 2013-06-03T12:53:58.607 回答
0

replace_all 函数的第二个参数是一个字符串,因为它来自 reps = open('1.txt', 'r').read().... 所以在字符串对象上调用 iteritems() 失败,因为字符串对象不存在函数。

于 2013-06-03T12:54:53.650 回答
0

首先,您应该在某处的文件中获取替换:

lookup = {}  # an empty dictionary
with open('replacements.txt') as f:
   for line in f:
      if ':' in line:
          bits = line.strip().split(':')
          lookup[bits[0].strip()] = bits[1].strip()

接下来,读取要替换的文件:

with open('somefile.txt') as infile, open('results.txt','w') as out:
    for line in infile:
       words = line.split()  # splits on whitespace
       for word in words:
           # For each word, see if it has a replacement
           # if it does, write the replacement otherwise write the word
           # to the outfile
           out.write(lookup.get(word,word))
于 2013-06-03T13:03:17.773 回答
0

您不需要使用literal_eval。这是你的文件:

% cat 1.txt 
foo:bar
abc:def

这是将其读入字典的代码。正如 Ashwini Chaudhary 所说,您会收到该错误,因为读取read()返回一个字符串。字符串没有名为iteritems.

>>> dic = {}
>>> with open('1.txt') as f:
...     for line in f:
...             trimmed_line = line.strip()
...             if trimmed_line:
...                     (key, value) = trimmed_line.split(':')
...                     dic[key]=value
... 
>>> dic
{'foo': 'bar', 'abc': 'def'}

:这当然假设您的文件中每行只有 1个。

于 2013-06-03T12:58:47.057 回答