java - Problem Inflating byte[] in Java?

Question

I ran into an issue which I can't figure out. Here is the definition of the problem: I have some data in a Blob column in Db2/Linux environment. Blob was written into DB2 after the byte[] was compressed using JDK compression (code that does this is running in Linux environment). I am trying to write a simple program to read some of this data decompress it (using JDK) and create a String from the decompressed byte array in Windows Environment (my development environment). Issue is that after I decompress the Blob (byte[]), length of the decompressed byte array is usually 1-3 bytes longer than expected. What I mean by expected is that the offset and length fields are also being stored in the database. So in this case, length of the decompressed byte array is usually longer than the stored length in database, just a few bytes. So if I create a String object from the decompressed byte array and create another String object using the substring(offset, length) method using the offset and length fields from the database, my second String(the one I got by using substring method) is shorter.

An example would be: database record contains a blob, offset: 0, length: 260,409 after decompressing the blob -

 compressedByte[].length  - 71,212
 decompressedByte[].length   - 260,412
 new String(decompressByte[]).length()  - 260,412
 new String(decompressByte[]).subString(0, 260,409).length() - 260409

For some other input records, the difference I am seeing is anywhere between 1-3 bytes in length.

I am sort of puzzled with this issue and wondering if anyone could suggest any tips so I can do more debugging to figure this issue out. I am wondering whether this could be somehow related to how bytes are being stored/written in Linux environment and how they are being read in Windows? Thanks for your help.

score 3 · Accepted Answer

我怀疑两个系统之间的默认编码不同。

// on the linux box   
byte [] blob = str.getBytes("UTF-8");

// in your code 
String str = new String(blob, "UTF-8");

或者至少找出 linux 机器上的默认编码是什么（普通 UTF-8）并跳过第 1 步。

关于这里可能发生的事情的一个很好的例子是关于软件的 Joel

score 2 · Accepted Answer

A String is not a general holder for bytes. You will undoubtedly have different default character encodings between your db2/Linux environment and your Windows environment which will be causing the conversion back and forth between bytes and characters to be different.

java - Problem Inflating byte[] in Java?

2 回答 2

Related

Reference