python - 从字符串中删除序数超出范围的所有字符

Question

什么是删除所有超出范围的字符的好方法：ordinal(128)从python中的字符串？

我在 python 2.7 中使用 hashlib.sha256。我得到了例外：

UnicodeEncodeError：'ascii' 编解码器无法在位置 13 编码字符 u'\u200e'：序数不在范围内（128）

我认为这意味着一些时髦的字符在我试图散列的字符串中找到了它的方式。

谢谢！

score 6 · Accepted Answer

new_safe_str = some_string.encode('ascii','ignore')

我认为会工作

或者你可以做一个列表理解

"".join([ch for ch in orig_string if ord(ch)<= 128])

[编辑] 然而，正如其他人所说，一般情况下弄清楚如何处理 unicode 可能会更好......除非您出于某种原因确实需要将其编码为 ascii

score 4 · Accepted Answer

与其删除这些字符，不如使用 hashlib 不会阻塞的编码，例如 utf-8：

>>> data = u'\u200e'
>>> hashlib.sha256(data.encode('utf-8')).hexdigest()
'e76d0bc0e98b2ad56c38eebda51da277a591043c9bc3f5c5e42cd167abc7393e'

score 2 · Accepted Answer

这是 python3 中的更改将进行改进的示例，或者至少会生成更清晰的错误消息

Python2

>>> import hashlib
>>> funky_string=u"You owe me £100"
>>> hashlib.sha256(funky_string)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 11: ordinal not in range(128)
>>> hashlib.sha256(funky_string.encode("utf-8")).hexdigest()
'81ebd729153b49aea50f4f510972441b350a802fea19d67da4792b025ab6e68e'
>>>

Python3

>>> import hashlib
>>> funky_string="You owe me £100"
>>> hashlib.sha256(funky_string)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Unicode-objects must be encoded before hashing
>>> hashlib.sha256(funky_string.encode("utf-8")).hexdigest()
'81ebd729153b49aea50f4f510972441b350a802fea19d67da4792b025ab6e68e'
>>>

真正的问题是它sha256需要一个python2没有明确概念的字节序列。使用.encode("utf-8")是我的建议。

python - 从字符串中删除序数超出范围的所有字符

3 回答 3

Related

Reference