I'm working on an application for which I need to be able to store EXIF metadata in a relational database. In the future I'd like to also support XMP and IPTC metadata, but at the moment my focus is on EXIF.
There are a few questions on Stack Overflow about what the table structure should look like when storing EXIF metadata. However, none of them really address my concern. The problem I have is that different EXIF tags have values in different formats, and there isn't really one column type which conveniently stores them all.
The most common type is a "rational" which is an array of two four-byte integers representing a fraction. But there are also non-fractional short and long integers, ASCII strings, byte arrays, and "undefined" (an 8-bit type which must be interpreted according to a priori knowledge of the specific tag.) I'd like to support all of these types, and I want to do so in a convenient, efficient, lossless (i.e. without converting the rationals to floats), extensible and searchable manner.
Here's what I've considered so far:
My current solution is to store everything as a string. This makes it pretty easy to store all of the different types, and is also convenient for searching and debugging. However, it's kind of clunky and inefficient because when I want to actually use the data, I have to do a bunch of string manipulation to convert the rational values into their fractional equivalents, e.g.
fraction = float(value.split('/')[0]) / float(value.split('/')[1])
. (It's not actually a messy one-liner like that in my real code, but this demonstrates the problem.)I could grab the raw EXIF bytes for each value from the file and store them in a blob column, but then I'd have to reinterpret the raw bytes every time. This could be marginally more CPU-efficient than the string solution, but it's much, much worse in every other way - on the whole, not worth it.
I could have a different table for each different EXIF datatype. Using this pattern I can maintain my foreign key relationships while storing my values in several different tables. However, this will make my most common query, which is to select all EXIF metadata for a given photo, kind of nasty. It will also become unwieldy very quickly when I add support for other metadata formats.
I'm not a database expert by any means, so there some pattern or magic union-style column type I'm missing that can make this problem go away? Or am I stuck picking my poison from among the three options above?