python - 从字符串列表中去除标点符号

Question

我有一个这样的单词列表

['Hey', 'yo', 'Hey?', 'Yeah.', 'john:']

我想删除, . " ' ? ! *开头或结尾的所有其他内容

for element in array:
    # perform

想法？

score 4 · Accepted Answer

Depends on what you mean by 'everything else.'

[elt.strip(',."\'?!*:') for elt in array]

is pretty easy and gets the job done, assuming you have a reasonably-small list of removable tokens.

In [1]: ar = ['Hey', 'yo', 'Hey?', 'Yeah.', 'john:']

In [2]: [elt.strip(',."\'?!*:') for elt in ar]
Out[2]: ['Hey', 'yo', 'Hey', 'Yeah', 'john']

Or, as suggested:

import string
[elt.strip(string.punctuation) for elt in ar]

Otherwise, if you want to remove everything else that isn't ~alphanumeric, you could do:

import re
[re.sub(r'\W+', '', elt) for elt in array]

which will remove all non-word (to be precise, [A-Za-z0-9_]) characters.

1 回答 1