python - 无法 grep 输出 python 程序，可能是 utf-16

Question

我写了一个基本的python程序来解析android的resources.arsc。它打印出在文件中找到的所有字符串。字符串在每个字符之间有一个零值字节。这向我表明字符串存储在 utf-16 中。我不知道这是否正确，但 android 字符串是可本地化的，所以我认为是。我正在使用 string.decode('hex') 以人类可读的格式打印字符串。这是一个包含组成字符串的字节列表的示例：

>>> print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')
res/drawable/about.png

问题是，当我将此程序通过管道传输到 grep 时，我无法 grep 读取任何字符串。如何将其打印到 shell 以便 grep 能够在其输出中匹配？谢谢！

（编辑）我确实打印了字符串，但在我的示例中，我认为最好同时显示“打印”版本和返回的版本。对困惑感到抱歉。在此示例中，无法 grep 的是“/res/drawable/about.png”。

（EDIT2）一个简单的演示：

11:33 AM ~/learning_python $ python -c "print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')"
res/drawable/about.png
11:33 AM ~/learning_python $ python -c "print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')" | grep about
11:33 AM ~/learning_python $

（EDIT3）另一个演示，我认为这证明数据是utf-16-be：

11:33 AM ~/learning_python $ python -c "print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')" > testfile
11:35 AM ~/learning_python $ iconv -f utf16be -t utf8 testfile
res/drawable/about.png
11:35 AM ~/learning_python $ iconv -f utf16be -t utf8 testfile | grep about
Binary file (standard input) matches
11:35 AM ~/learning_python $ iconv -f utf16be -t utf8 testfile | grep -a about
res/drawable/about.png

score 2 · Accepted Answer

解码字符：

'\x00r\x00e\x00s'.decode('utf-16-be') # produces u'res'

然后你可以打印出解码后的字符串：

$ python -c "print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00', '00']).decode('hex').decode('utf-16-be').rstrip('\0')" | grep about
res/drawable/about.png

score 1 · Accepted Answer

使用ripgrep实用程序而不是grep可以支持 UTF-16 文件的实用程序。

ripgrep 支持以 UTF-8 以外的文本编码搜索文件，例如 UTF-16、latin-1、GBK、EUC-JP、Shift_JIS 等。（提供了一些对自动检测 UTF-16 的支持。其他文本编码必须用-E/专门指定--encoding flag.）。

示例语法：

rg sometext file

python - 无法 grep 输出 python 程序，可能是 utf-16

2 回答 2

Related

Reference