unicode - 无法使 Ant 使用 unicode (c) 字符编写正确的版本信息

Question

将 ant 从 1.6 升级到 1.8.3 版本后，使用 Ant 构建的 Windows .dll 的信息资源已损坏。

以前，此值已正确保存到 version-info 资源：

product.copyright=\u00a9 Copyright 20xx-20xx yyyyyyyyyy \u2122（所以 (c) 和 TM 符号被正确显示）。

升级 Ant 后，默认编码更改为预期的 UTF-8，但目前版权字符串如下所示：

Â© Copyright 20xx-20xx yyyyyy â„¢

这不是控制台问题- 我使用十六进制编辑器和文件属性对话框进行了检查 - 两者都显示不正确。

查看文件hexdump，我看到发生以下（显然不正确）映射

\u00a9 -> 0x00c2 0x00a9
\u2122 -> 0x00e2 0x201e 0x00a2

这里的问题是 Ant 将 UTF-8 字节（不是 Unicode 字符串）编码为 16 位字符并将其写入版本信息。

尽管这看起来像是 ant 中的错误，但我想问是否有人设法找到解决此问题或类似问题的任何解决方法。

以下是脚本的一些片段：项目属性文件：

...
product.copyright=(c) Copyright 2005-2012 Clarabridge
....

build.xml 中包含的文件：

<versioninfo id="current-version" if="is-windows"
    fileversion="${product.version}"
    productversion="${product.version}"
    compatibilityversion="1"
    legalcopyright="${product.copyright}"
    companyname="${product.company}"
    filedescription="${ant.project.name}"
    productname="${ant.project.name}"
/>
...
<cc objdir="${target.dir}/${target.platform}/obj"
    outfile="${target.dir}/${target.platform}/${ant.project.name}"
    subsystem="other"
    failonerror="true"
    incremental="false"
    outtype="shared"
    runtime="dynamic"
>
    <versioninfo refid="current-version" />
    <compiler refid="compiler-shared-${target.platform}" />
    <compiler refid="rc-compiler" />
    <linker extends="linker-${target.platform}">
        <libset dir="${target.dir}/${target.platform}/lib" libs="${lib.list}" />
    </linker>

    <fileset dir="${src.dir}" casesensitive="false">
        <include name="*.cpp"/>
    </fileset>
</cc>

score 2 · Accepted Answer

Your bug is that something is misinterpreting the UTF-8 characters as 8-bit ones!!!

BTW, Java doesn’t use 16-bit characters; that would be UCS-2. Java uses UTF-16, which is just as much a variable-width encoding as UTF-8 is. Distressing how many Java programmers screw this up!

UTF-8 has 8-bit code units where UTF-16 has 16-bit code units; neither one supports an “8-bit character” or a “16-bit character”. If you catch yourself writing code that thinks they do, you’ve just written buggy code.

Your output is the result of erroneously displaying UTF-8 as though it were in Latin1, which does use 8-bit characters. You, however, do not.

unicode - 无法使 Ant 使用 unicode (c) 字符编写正确的版本信息

1 回答 1

Related

Reference