I'm currently writing a Photoshop PSD file parser to extract some specific metadata. Skipping over possibly big chunks of irrelevant data suggests some random access. And I need to support PSB as well as PSD - PSB has the same basic file structure but with bigger limits and with some size fields as 64-bits.
With the Photoshop file format specs. publicly available, this should be a trivial job, but I've never needed to worry about large files before, at least while using ifstream
. I've always got away with ignoring streampos
etc and just using size_t
for file positions and offsets. I'm not getting away with that now.
I wasn't expecting that to be a big deal, and it isn't really, but it's turning out to be awkward and messy - I assume that means I'm fighting the library rather than doing what it expects.
First issue - there are three position/offset/size types...
streampos
streamoff
streamsize
This seems unnecessary to me. A position in a file is an offset from the start. The size of a file is also the offset from the start to the end. That's why we specify positions, offsets and sizes in vector
with the same type - size_t
.
In fact the easiest way I've found to determine the size of a file gives me a streampos
- seek to the end of the file and query the position...
myfile.seekg (0, ios::end);
streampos myfilesize = myfile.tellg ();
The streampos
type is also an instance of a template class fpos
(which using GCC 4.7.0 seems to be 16 bytes - I haven't checked whether I'm building 32-bit or 64-bit code but either way that seems oversize). Why not just use an integer type?
Some annoyances...
streampos p;
p++; // Error - streampos doesn't have operator++
p = p + 1; // Warning - ambiguous (using 1ul or 1ull doesn't seem to fix it)
Those warning issues happen with comparisons and elsewhere too.
Basically, I'm getting a lot of irritating noise to work around these - lot's of casts to streampos
or streamoff
etc.
I have the feeling I'm being an idiot and missing something obvious, but my Google-skills are failing me - I haven't found an example that shows what I'm doing wrong.
So - what am I doing wrong? Is there an idiomatic style to doing random access with position calculations that avoids all this noise?