python - 如何在 Python 中的非打印 ascii 字符处分割行

Question

如何在 Python 中以非打印 ascii 字符分割一行（例如长减号十六进制 0x97 ，八进制 227）？我不需要角色本身。之后的信息将被保存为变量。

score 5 · Accepted Answer

您可以使用re.split.

>>> import re
>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']

调整图案以仅包含您要保留的字符。

另请参阅：从-a-string-in-python 中剥离非打印字符

示例（带长减号）：

>>> # \xe2\x80\x93 represents a long dash (or long minus)
>>> s = 'hello – world'
>>> s
'hello \xe2\x80\x93 world'
>>> import re
>>> re.split("\xe2\x80\x93", s)
['hello ', ' world']

或者，与 unicode 相同：

>>> # \u2013 represents a long dash, long minus or so called en-dash
>>> s = u'hello – world'
>>> s
u'hello \u2013 world'
>>> import re
>>> re.split(u"\u2013", s)
[u'hello ', u' world']

score 2 · Accepted Answer

_, _, your_result= your_input_string.partition('\x97')

或者

your_result= your_input_string.partition('\x97')[2]

如果your_input_string不包含 a '\x97'，your_result则为空。如果your_input_string包含多个 '\x97'字符，your_result将包含第一个'\x97'字符之后的所有内容，包括其他'\x97'字符。

score 1 · Accepted Answer

只需使用字符串/unicode 拆分方法（他们并不真正关心您拆分的字符串（除了它是一个常量。如果您想使用正则表达式，请使用 re.split）

要获得拆分字符串，要么像其他人显示的那样转义它“\x97”

或者

对字符串 (0-255) 使用 chr(0x97) 或对 unicode 使用 unichr(0x97)

所以一个例子是

'will not be split'.split(chr(0x97))

'will be split here:\x97 and this is the second string'.split(chr(0x97))

python - 如何在 Python 中的非打印 ascii 字符处分割行

3 回答 3

Related

Reference