I wrote some code to serialize a HashMap<String,Double>
by iterating entries and serializing each of them instead of using ObjectOutputStream.readObject()
. The reason is just efficiency: the resulting file is much smaller and it is much faster to write and read (eg. 23 MB in 0.6 seconds vs. 29 MB in 9.9 seconds).
This is what I did to serialize:
ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream("test.bin"));
oos.writeInt(map.size()); // write size of the map
for (Map.Entry<String, Double> entry : map.entrySet()) { // iterate entries
System.out.println("writing ("+ entry.getKey() +","+ entry.getValue() +")");
byte[] bytes = entry.getKey().getBytes();
oos.writeInt(bytes.length); // length of key string
oos.write(bytes); // key string bytes
oos.writeDouble(entry.getValue()); // value
}
oos.close();
As you can see, I get the byte
array for each key String
, serialize its length and then the array itself. This is what I did to deserialize:
ObjectInputStream ois = new ObjectInputStream(new FileInputStream("test.bin"));
int size = ois.readInt(); // read size of the map
HashMap<String, Double> newMap = new HashMap<>(size);
for (int i = 0; i < size; i++) { // iterate entries
int length = ois.readInt(); // length of key string
byte[] bytes = new byte[length];
ois.read(bytes); // key string bytes
String key = new String(bytes);
double value = ois.readDouble(); // value
newMap.put(key, value);
System.out.println("read ("+ key +","+ value +")");
}
The problem is that at some point the key is not serialized correctly. I've been debugging to the point where I could see that ois.read(bytes)
read 8 bytes instead of 16 as it was supposed to, so the key String
was not properly formed and the double
value was read using the last 8 bytes from the key that were not read yet. In the end, Exceptions everywhere.
Using the sample data below, the output will be like this at some point:
read (2010-00-056.html,12154.250518054876)
read (2010-00- ,1.4007397428546247E-76)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at ti.Test.main(Test.java:82)
The problem can be seen in the serialized file (it should read 2010-00-008.html
):
two bytes are added in between the String
key. See MxyL's answer for further info about this. So it all boils down to: why are those two bytes added, and why readFully
works ok?
Why isn't the String
properly (de)serialized? Might it be some kind of padding to a fixed block size or something like that? Is there a better way to manually serialize a String
when looking for efficiency? I was expecting some kind of writeString
and readString
, but seems there is no such thing in Java's ObjectStream
.
I've been trying using buffered streams just in case there is something wrong there, explicitly saying how many bytes to write and to read, using different encodings, but no luck.
This is some sample data to reproduce the problem:
HashMap<String, Double> map = new HashMap<String, Double>();
map.put("2010-00-027.html",21732.994621513037); map.put("2010-00-020.html",3466.5169348296736); map.put("2010-00-051.html",12528.648992702407); map.put("2010-00-062.html",3354.8950010256385);
map.put("2010-00-024.html",10295.095511718278); map.put("2010-00-052.html",5381.513344679818); map.put("2010-00-007.html",16466.33813960735); map.put("2010-00-017.html",9484.969198176652);
map.put("2010-00-054.html",15423.873112634772); map.put("2010-00-022.html",8123.842752870753); map.put("2010-00-033.html",21238.496665104063); map.put("2010-00-028.html",7578.792651786424);
map.put("2010-00-048.html",3566.4118233046393); map.put("2010-00-040.html",2681.0799941861724); map.put("2010-00-049.html",14308.090890746222); map.put("2010-00-058.html",5911.342406606804);
map.put("2010-00-045.html",2284.118716145881); map.put("2010-00-031.html",2859.565771680721); map.put("2010-00-046.html",4555.187022907964); map.put("2010-00-036.html",8479.709295569426);
map.put("2010-00-061.html",846.8292195815125); map.put("2010-00-023.html",14108.644025417952); map.put("2010-00-041.html",22686.232732684934); map.put("2010-00-025.html",9513.539663409734);
map.put("2010-00-012.html",459.6427911376829); map.put("2010-00-005.html",0.0); map.put("2010-00-013.html",2646.403220496738); map.put("2010-00-065.html",5808.86423609936);
map.put("2010-00-056.html",12154.250518054876); map.put("2010-00-008.html",10811.15198506469); map.put("2010-00-042.html",9271.006516004005); map.put("2010-00-000.html",4387.4162586468965);
map.put("2010-00-059.html",4456.211623469774); map.put("2010-00-055.html",3534.7511584735325); map.put("2010-00-057.html",8745.640098512009); map.put("2010-00-032.html",4993.295735075575);
map.put("2010-00-021.html",3852.5805998017922); map.put("2010-00-043.html",4108.020033536286); map.put("2010-00-053.html",2.2446400279239946); map.put("2010-00-030.html",17853.541210836203);