3

将 ant 从 1.6 升级到 1.8.3 版本后,使用 Ant 构建的 Windows .dll 的信息资源已损坏。

以前,此值已正确保存到 version-info 资源:

product.copyright=\u00a9 Copyright 20xx-20xx yyyyyyyyyy \u2122(所以 (c) 和 TM 符号被正确显示)。

升级 Ant 后,默认编码更改为预期的 UTF-8,但目前版权字符串如下所示:

© Copyright 20xx-20xx yyyyyy ™

不是控制台问题- 我使用十六进制编辑器和文件属性对话框进行了检查 - 两者都显示不正确。

查看文件hexdump,我看到发生以下(显然不正确)映射

\u00a9 -> 0x00c2 0x00a9
\u2122 -> 0x00e2 0x201e 0x00a2

这里的问题是 Ant 将 UTF-8 字节(不是 Unicode 字符串)编码为 16 位字符并将其写入版本信息。

尽管这看起来像是 ant 中的错误,但我想问是否有人设法找到解决此问题或类似问题的任何解决方法。

以下是脚本的一些片段: 项目属性文件:

...
product.copyright=(c) Copyright 2005-2012 Clarabridge
....

build.xml 中包含的文件:

<versioninfo id="current-version" if="is-windows"
    fileversion="${product.version}"
    productversion="${product.version}"
    compatibilityversion="1"
    legalcopyright="${product.copyright}"
    companyname="${product.company}"
    filedescription="${ant.project.name}"
    productname="${ant.project.name}"
/>
...
<cc objdir="${target.dir}/${target.platform}/obj"
    outfile="${target.dir}/${target.platform}/${ant.project.name}"
    subsystem="other"
    failonerror="true"
    incremental="false"
    outtype="shared"
    runtime="dynamic"
>
    <versioninfo refid="current-version" />
    <compiler refid="compiler-shared-${target.platform}" />
    <compiler refid="rc-compiler" />
    <linker extends="linker-${target.platform}">
        <libset dir="${target.dir}/${target.platform}/lib" libs="${lib.list}" />
    </linker>

    <fileset dir="${src.dir}" casesensitive="false">
        <include name="*.cpp"/>
    </fileset>
</cc>
4

1 回答 1

2

Your bug is that something is misinterpreting the UTF-8 characters as 8-bit ones!!!

BTW, Java doesn’t use 16-bit characters; that would be UCS-2. Java uses UTF-16, which is just as much a variable-width encoding as UTF-8 is. Distressing how many Java programmers screw this up!

UTF-8 has 8-bit code units where UTF-16 has 16-bit code units; neither one supports an “8-bit character” or a “16-bit character”. If you catch yourself writing code that thinks they do, you’ve just written buggy code.

Your output is the result of erroneously displaying UTF-8 as though it were in Latin1, which does use 8-bit characters. You, however, do not.

于 2012-05-10T12:37:59.180 回答