python - 使用python将字符串中的单词替换为列表中的单词

Question

我正在用 Python 创建一个词云程序，但我被困在一个词替换功能上。我正在尝试用有序列表中的单词替换 html 文件中的一组数字（所以我正在使用字符串）。所以000将替换为列表中的第一个单词，替换为第二个单词001，依此类推。

所以在下面我让它选择w要正确替换的单词，但我不能让它用字符串中的单词正确替换它。任何帮助表示赞赏。谢谢！

def replace_all():  
  text = '000 001 002 003 '
  word = ['foo', 'bar', 'that', 'these']
  for a in word:    
    y = -1
    for w in text:     
      y = y + 1
      x = "00"+str(y)
      w = {x:a}      
      for i, j in w.iteritems():
        text = text.replace(i, j)
  print text

score 4 · Accepted Answer

这实际上是一个非常简单的列表理解：

>>> text = '000 001 002 003 '
>>> words = ['foo', 'bar', 'that', 'these']
>>> [words[int(item)] for item in text.split()]
['foo', 'bar', 'that', 'these']

编辑：如果您需要保留其他值，则可以满足以下要求：

def get(seq, item):
    try:
        return seq[int(item)]
    except ValueError:
        return item

然后简单地使用类似的东西- 自然，如果字符串中会有其他数字可能会被意外替换，则[get(words, item) for item in text.split()]可能需要进行更多测试。get()（编辑结束）

我们所做的是将文本拆分为单独的数字，然后将它们转换为整数并使用它们来索引您提供的列表以查找单词。

至于为什么您的代码不起作用，主要问题是您正在循环字符串，这将为您提供字符，而不是单词。但是，这不是解决任务的好方法。

还值得一提的是，当您循环遍历值并希望索引与它们一起使用时，您应该使用内置enumerate()函数而不是使用计数变量。

例如：而不是：

y = -1
for w in text:
    y = y + 1
    ...

利用：

for y, w in enumerate(text):
    ...

这更具可读性和 Pythonic。

您现有代码的另一件事是：

w = {x:a}      
for i, j in w.iteritems():
    text = text.replace(i, j)

如果您考虑一下，可以简化为：

text = text.replace(x, a)

您正在设置w为一个项目的字典，然后循环遍历它，但您知道它只会包含一个项目。

更接近您的方法的解决方案将是这样的：

words_dict = {"{0:03d}".format(index): value for index, value in enumerate(words)}
for key, value in words_dict.items():
    text = test.replace(key, value)

我们创建一个从零填充数字字符串（使用str.format()）到值的字典，然后替换每个项目。请注意，当您使用 2.x 时，您会想要dict.iteritems()，如果您是 2.7 之前的版本，请dict()在元组生成器上使用内置函数，因为不存在 dict 理解。

score 0 · Accepted Answer

在处理文本时，显然必须考虑正则表达式。

import re

text = text = ('<p><span class="newStyle0" '
               'style="left: 291px; '
               'top: 258px">000</span></p> <p>'
               '<span class="newStyle1" '
               'style="left: 85px; '
               'top: 200px">001</span></p> <p>'
               '<span class="newStyle2" '
               'style="left: 580px; '
               'top: 400px; width: 167px; '
               'height: 97px">002</span></p> <p>'
               '<span class="newStyle3" '
               'style="left: 375px; top: 165px">'
               '003</span></p>')

words = ['XXX-%04d-YYY' % a for a in xrange(1000)]

regx = re.compile('(?<=>)\d+(?=</span>)')

def gv(m,words = words):
    return words[int(m.group())]

print regx.sub(gv,text)

python - 使用python将字符串中的单词替换为列表中的单词

2 回答 2

Related

Reference