0

I have strings (about 1-5Kb) of the form:

FF,A3V,X7Y,aA4,....

lzw compresses these really nicely, but includes Turkish characters. These are then submitted to a MySQL database.

Sometimes MySQL can 'play-up' and not submit these properly, putting question marks '?' in place of the Turkish characters. They can do this even when you have your text areas properly defined. Exporting and reimporting the table can sort this out. This is fine for my test database, but not something I am happy with when this goes live.

Consequently I am looking for an alternative to lzw, which will compress but only using normal letters/numbers etc.

Does anyone know of a PUBLIC DOMAIN compression method that avoid Turkish Characters (and any other non-standard characters)? Can anyone point me to some code in javascript (or c++ or c# which I can convert)?

4

1 回答 1

1

VARCHAR为了扩展评论中所说的内容......在or CHARorTEXT列中存储字节字符串(例如压缩算法的输出通常包含)是无效的用法。

这些列类型不适用于字节字符串,它们仅用于有效字符的字符串。并非每个字节字符串都包含任何给定字符集中的有效字符串......并且 MySQL 不会允许无效字符(对于某些字符集,“字符”和“字节”之间的相关性不是 1 :1)。

在过去的美好时光™中,两者是可以互换的,但现在不再是这种情况了(并且在一段时间内都不是这样)。

相反,如果您的列类型是BINARYor VARBINARYor BLOB,则问题应该消失,因为这些数据类型用于二进制数据。

于 2015-06-02T22:10:21.193 回答