0

I have a file that uses non-ASCII characters, when saving this file using a filestream the characters that end up in the file are not as expected.

I write

stream
BT 38.3774 710 TD /F10 12.0000 Tf (België)Tj ET
endstream

what ends up in the file is

stream
BT 38.3774 710 TD /F10 12.0000 Tf (België)Tj ET
endstream

the strings are UTF8 encoded into bytes before using filestream.write to save them to the file.

Can someone help me understand why this happens ?

I have been abled to reproduce the result in a short version of the code

Using newFile As New FileStream("C:\Users\Sed\Documents\test.txt", FileMode.Create)
        Dim content As String = "België"
        Dim contentByte As Byte() = New UTF32Encoding().GetBytes(content)
        newFile.Write(contentByte, 0, contentByte.Length)
        contentByte = New UTF8Encoding().GetBytes(content)
        newFile.Write(contentByte, 0, contentByte.Length)
    End Using

giving the result

B   e   l   g   i   ë   België

so I expect that the filestream somehow assumes that its UTF32 encoded while the content of the file is being written in UTF8 ...

Encoding it all in UTF32 does not provide the answer. The file messes up completely then...

Still dont understand why this happens, but I have a workaround in my head that i need to explore.

4

1 回答 1

0

我已经想通了...

我按照我的方式创建文件,它使用的编码是 ANSI 或 encoding.Default

如此变化

Dim newObjectByte As Byte() = New UTF8Encoding(True).GetBytes(DataObject("pdfObjectString").ToString())

Dim newObjectByte As Byte() = Encoding.Default.GetBytes(DataObject("pdfObjectString").ToString())

解决了我的代码页问题。

感谢The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)这让我想到了 codePage、ANSI ASCII 和所有这些东西……

于 2013-08-07T20:15:41.827 回答