python - 从列表中的字符串中删除违规字符

Question

要解析的示例数据（Unicode 字符串列表）：

[u'\n', u'1\xa0', u'Some text here.', u'\n', u'1\xa0', u'Some more text here.', 
u'\n', u'1\xa0', u'Some more text here.']

我想\xa0从这些字符串中删除。

编辑： 当前方法不起作用：

def remove_from_list(l, x):
  return [li.replace(x, '') for li in l]

remove_from_list(list, u'\xa0')

我仍然得到完全相同的输出。

score 5 · Accepted Answer

每个版本的代码中的问题都不同。让我们从这个开始：

newli = re.sub(x, '', li)
l[li].replace(newli)

首先，已经newli是你想要的那条线——就是这样——所以你根本不需要这里。只需分配.re.subreplacenewli

其次，l[li]不会起作用，因为li是line的value ，而不是index。

在这个版本中，它是一个但更微妙的：

li = re.sub(x, '', li)

re.sub正在返回一个新字符串，并且您将该字符串分配给li. 但这不会影响列表中的任何内容，它只是说“li不再引用列表中的当前行，它现在引用这个新字符串”。

替换列表元素的唯一方法是获取索引，以便您可以使用[]运算符。为了得到它，你想使用enumerate.

所以：

def remove_from_list(l, x):
  for index, li in enumerate(l):
    l[index] = re.sub(x, '', li)
  return l

但实际上，您可能确实想使用str.replace— 只是您想使用它而不是re.sub：

def remove_from_list(l, x):
  for index, li in enumerate(l):
    l[index] = li.replace(x, '')
  return l

那么您不必担心如果x是正则表达式中的特殊字符会发生什么。

此外，在 Python 中，您几乎从不想就地修改对象并返回它。修改它并返回None，或者返回对象的新副本。所以，要么：

def remove_from_list(l, x):
  for index, li in enumerate(l):
    newli = li.replace(x, '')
    l[index] = newli

… 或者：

def remove_from_list(l, x):
  new_list = []
  for li in l:
    newli = li.replace(x, '')
    new_list.append(newli)
  return new_list

您可以简单地将后者理解为列表理解，如 unutbu 的回答：

def remove_from_list(l, x):
  new_list = [li.replace(x, '') for li in l]
  return new_list

第二个更容易编写（不需要enumerate，有一个方便的快捷方式等）这一事实并非巧合——它通常是你想要的，所以 Python 使它变得简单。

我不知道如何使这一点更清楚，但最后一次尝试：

如果您选择返回列表的固定新副本而不是就地修改列表的版本，则不会以任何方式修改您的原始列表。如果要使用修复后的新副本，则必须使用函数的返回值。例如：

>>> def remove_from_list(l, x):
...     new_list = [li.replace(x, '') for li in l]
...     return new_list
>>> a = [u'\n', u'1\xa0']
>>> b = remove_from_list(a, u'\xa0')
>>> a
[u'\n', u'1\xa0']
>>> b
[u'\n', u'1']

您的实际代码将所有内容转换为 1 个字符和 0 个字符的字符串列表时遇到的问题是，您实际上并没有一个字符串列表，而是一个字符串repr列表的字符串。所以，for li in l意思是“对于li字符串中的每个字符l，而不是for each stringli in the listl`。

score 3 · Accepted Answer

如果您只对 ASCII 字符感兴趣，则另一种选择（正如您提到characters的，但这也恰好适用于已发布示例的情况）：

[text.encode('ascii', 'ignore') for text in your_list]

score 1 · Accepted Answer

您可以使用列表理解和str.replace：

>>> items
[u'\n',
 u'1\xa0',
 u'Some text here.',
 u'\n',
 u'1\xa0',
 u'Some more text here.',
 u'\n',
 u'1\xa0',
 u'Some more text here.']
>>> [item.replace(u'\xa0', u'') for item in items]
[u'\n',
 u'1',
 u'Some text here.',
 u'\n',
 u'1',
 u'Some more text here.',
 u'\n',
 u'1',
 u'Some more text here.']

python - 从列表中的字符串中删除违规字符

3 回答 3

Related

Reference