4

我有这个奇怪的 xml 文件,它显然包含 jpeg 图像数据:

<?xml version="1.0" encoding="UTF-8"?>
<AttachmentDocument xmlns="http://echa.europa.eu/schemas/iuclid5/20070330" documentReferencePK="ECB5-d18039fe-6fb0-44d6-be9e-d6ade38be543/0" encoding="0" fileSize="5788" fileTimestamp="2007-04-17T12:38:44Z" parentDocumentPK="ECB5-fb07efbf-ee93-4cdd-865b-49efa51cbd15/0" version="2007-03-19T14:13:29Z">
    <modificationHistory>
        <modification date="2007-05-10T09:00:00Z">
            <comment>Created</comment>
            <modificationBy>European Commision/Joint Research Centre/European Chemicals Bureau</modificationBy>
        </modification>
    </modificationHistory>
    <ownershipProtection copyProtection="false" fractionalDocument="false" sealed="false"/>
    <fileName>33952-38-4-V2.jpeg</fileName>
    <fileMimetype>image/jpeg</fileMimetype>
    <rawContent>
        H4sIAAAAAAAAAO2XZ1AU65qAe5iBIQwgOCMZRkHJCIgEySBhyEEyIyDgMBKHLEFQBJEoIHBEQFQE
        JUjOSo4iOQ+Ss2QkSZhZvLXn7j11726d3draH1vn7Xp+dH1fd/XzvV+//TZxlDgNnNNQRakCIBAA
        gM4OgEgA5JQNVBRv6RrcQGLsBO+52WOQ3iJCwkgeLw+sCwaJ0lBDauipqCG9xUV5BZB29ndtvJw8
        kTgvGyes531K4jigDJCTkUHJSMmhUCgFBTklDE4No6KCMdGfp4WzMXOwszGzsiK5hLiRlwQ4WVl5
                ...
    </rawContent>
    <MD5>0d80850b0c4085500f80e1430b90c70910d4110cc0d7</MD5>
</AttachmentDocument>

(完整版在这里)而且我无法从中读取图像。

我的尝试:

from PIL import Image
import StringIO
import base64

# I've eleminated all newlines and tabs to produce data string
data="H4sIAAAAAAAAAO2XZ1AU65qAe5..."
im = Image.open(StringIO.StringIO(base64.b64decode(data)))

但我收到一个错误:

File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/dist-packages/PIL/Image.py", line 1980, in open
    raise IOError("cannot identify image file")
4

1 回答 1

3

如果您检查 base64 解码输出中的内容,您会注意到它是一个 gzip 文件。提取压缩文件,您将获得所需的 JPEG。

图像中存储的评论:

CREATOR: gd-jpeg v1.0 (using IJG JPEG v62), default quality
于 2012-12-19T10:12:38.973 回答