python - 如何在 Python 中分隔字符串中的标点符号？

Question

我基本上是从文件中解析数据。在我的代码中的某个时刻，我根据空格字符 --> str.split(" ") 分割文件的每一行。我需要一种方法来分离字符串中可能出现的任何标点符号。

当我说标点符号时，我的意思是任何返回的字符

import string
print (string.punctuation)

谢谢！

score 3 · Accepted Answer

我会为此使用正则表达式：

>>> re.split(r'(\W)', 'This is a sentence. This is another sentence.')
    ['This',
 ' ',
 'is',
 ' ',
 'a',
 ' ',
 'sentence',
 '.',
 '',
 ' ',
 'This',
 ' ',
 'is',
 ' ',
 'another',
 ' ',
 'sentence',
 '.',
 '']

您可以遍历结果列表，更改单词，然后将''.join()其重新转换为在相同位置使用相同标点符号的句子。

score 0 · Accepted Answer

坚持原版会更容易，不是吗？你把标点符号放回去的最终目标是什么？如果您只是要重建整条生产线，为什么不先保留它呢？

pattern = '['+''.join(string.punctuation)+']+' # Make a char set in regex syntax

for line in file:
    tokens = line.split(' ')
    for token in tokens:
        parsed = parse_token(re.sub(pattern, token))
        # Now do whatever else you might need to do with token and parsed.
    # Remember, you still have access to the `line` string and `tokens` list!

def parse_token(token):
    pass # Do whatever you need to do with your "clean" token here.

python - 如何在 Python 中分隔字符串中的标点符号？

2 回答 2

Related

Reference