我想从 mathcad xml 二进制内容中提取数据,但似乎我犯了一些错误,有人可以帮忙吗?
import gzip
import base64
bin_str_gzip = '''
H4sIAAAAAAAA/+xWz2sTURCe3WQ3Sc2arJqo9RJEbx500YNQNJhWEC0tqRSkINjs0qSk2TYN
Wm8pelSs+AdoEcSLB+3Fo14EQWh78uDF/gcV8WjWb97bTbY5aP3RgzSzfPvezryZefPe23mT
JCIFGAP6RF/FO9qoO84RYrKA+ExpZHLaKTUEh4rAAJTyEevxW6Jm1Lr+hWhRs4xZIdf975il
pohsnSii2mTDjhyvJyyhHmeX7C5RGB8bmrvUcGYUYSDJRsIMA9gjhsaduaIzVXFrmhCcZm3b
LUleVPDOtsPQ58uzF9wFOesrQAyMYjuOYWAUuAvk4WgdbRlYkmJ4JEqLCajJIDZDxNZM7pWx
Ge3YUjI2Iy1jS3FspkVomymhqfhWaZ+YWxIxF9yZSZejlII0zxDsDoc1B7AtPOGHwAfgPOB5
PBpDR6pOoVpxag2ho0gvSscfdNdWzc/LL/s3qIvOUYRaXoL0EK+jLaej+t8tjz1K8nr0X9F3
oBVCj3YXFcnF06AcDVENbZ1ud6eCn1KWtPY/z7lgkc4oR81P5qMmGePvtP7w2JOxlac3Ln5U
Iui/8RNJAd7ZryPe87/lmwnpUgnHs129F8qvx2yX/sT/v6S/8c/LwHmc94QvSL44Oef3csHu
IA3VUzZDdMe/6PsIdZdAcDpEq/rHZALISF7W79pUEacnRbJGjPg6ldAp45LtNfDV23pCl0J/
4TCqzBxQRi6o4snRIN5TaC1xxjc3NmlB1nBkTRy/Gbaz8mrwwbcdtD3Nnav33p8qrZnP7tOT
Y6vaelANxf2lCSjgRwNG83K+vdrBBFxfNNoR3QKzDpFNJ3TfBG7kLdgp2i8rUbHrz5evmXB+
oLs6z/jVeVZWsBne7INSL9PROxSSHg5J+fsHAAAA//8DAHmhfMNNDAAA
'''
encodings = ['ascii', 'utf-8', 'latin_1']
for encoding in encodings:
print(f'\n{encoding}{"-"*50}')
org_str = gzip.decompress(base64.b64decode(bin_str_gzip)).decode(encoding, 'ignore')
print(org_str)
我得到了什么:
# binary content with "gzip" tag...
ascii--------------------------------------------------
S
p2^2ddd2 2<
2<@
2CVSComboItemCVSItem<
t?CVSOleClientItem
[di
iMS Shell Dlg 2x2[%vDMS Shell Dlg 2x2[%vDjT1c$K@helloP@world,2Y2ddd22<@
2Y2<@
2
utf-8--------------------------------------------------
S
p2^2ddd2 2<
2<@
2CVSComboItemCVSItem<
t?CVSOleClientItem
[di
iMS Shell Dlg 2x2[%vDMS Shell Dlg 2x2[%vDjT1c$K@helloP@world,2Y2ddd22<@
2Y2<@
2
latin_1--------------------------------------------------
S
p2^ñ2ddd2Á 2<
2<@
2CVSComboItemCVSItem<
tÌ?ÿÿCVSOleClientItem
[di
i¸óÿÿÿMS Shell Dlg 2ÿÿðáðx2[%v³²DõÿÿÿMS Shell Dlg 2ÿÿðáðx2[%v³²DjTÉ1cЦ $ÏÑK@ÿÿÿÿÿÿÿÿhelloP@ÿÿÿÿÿÿÿÿworld,ÿþÿÿþÿÿþÿÿþÿ2¨¡Y2ddd2Á2<@
2¨¡Y2<@
2
我找到的一些指南:
..xmcd 是 UTF-8 编码的 XML 格式。图像数据、组件数据和 OLE 数据等二进制有效负载被压缩并插入到编码为 base64 ASCII 的 XML 中。
和 mathcad 表在下面,我看到一些像“你好”“世界”这样的词,但看不到“55”“66”,所以我想我犯了一些错误,有人可以指导吗?