To answer your question about strings, the length value that is stored in the archive is itself variable-length depending on the length and encoding of its string. If the string is < 255
characters, one byte is used for the length. If the string is 255 - 65534
characters, 3 bytes are used - a 1-byte 0xFF
marker followed by a 2-byte word. If the string is 65535+
characters, 7 bytes are used - a 3-byte 0xFF 0xFF 0xFF
marker followed by a 4-byte dword. To make it even more complicated, if the string is Unicode encoded, the length value is preceeded by a 3-byte 0xFF 0xFFFE
marker. So in any, combination, you will never see a 4-byte length by itself, so what you showed has to be 3 0x00
bytes belonging to something else, followed by a 1-byte string length 0x29
.
So, the correct way to read a string is as follows:
Assume: string data is Ansi unless told otherwise.
Read a byte. If its value is < 255, string length is the value, goto 3.
Read a word. If its value is 0xFFFE
, string data is Unicode, goto 1. Otherwise, if its value is < 65535, string length is its value, goto 3. Otherwise, read a dword, string length is its value, goto 3.
read string length number of 8bit or 16bit values, depending on whether string is Ansi or Unicode, and then convert to desired encoding as needed.