python - 解码带有隐藏控制字符的字符串

Question

我有一个文件，其中包含带有隐藏控制字符的行。示例行如下所示：

go!^Mbap^[<80>

、^M和^[是<80>隐藏字符。当我打印该行时，我看不到这些字符。但是，如果我使用该repr()函数，我可以看到这些字符由\x1b0.

如何将这些字符更改为我选择的 unicode 字符？

我尝试使用字符串模块translate()函数和正则表达式，但我似乎无法转换这些隐藏字符。

score 0 · Accepted Answer

Here is an example of how to use str.translate and (below) unicode.translate:

In [48]: import string

In [49]: text = 'go!\x1b0'

In [50]: text.translate(string.maketrans('\x1b\xa0','??'))
Out[50]: 'go!?0'

The above command translates all '\x1b' and '\xa0' to question marks.

Or, if you want to translate a unicode string:

In [55]: text = 'go!\x1b0'

In [56]: unitext = text.decode('latin-1')

In [57]: unitext
Out[57]: u'go!\x1b0'

In [58]: unitext.translate({ord(u'\x1b'):ord(u'?')})
Out[58]: u'go!?0'

If you have more than one character to translate, it may be more convenient to define the table this way:

In [59]: table = dict(zip(map(ord, u'\xb1\xa0'), map(ord, u'??')))

In [60]: unitext.translate(table)
Out[60]: u'go!\x1b0'

python - 解码带有隐藏控制字符的字符串

1 回答 1

Related

Reference