0

I've read the many discussions about Databases vs. file systems for storing files. Most of these discussions talk about images and media files. My question is:

1) Do the same arguments apply to storing .doc, .pdf, .xls, .txt? Are there anything special about document files I should be aware of?

2) If I store in a database as binary, will there be endian issues if my host swaps machines? e.g., I insert into the database on a big-endian machine, it gets ported to an little-endian machine, then I try to extract (e.g., write to file, send it to my desktop, then try to open).

Thanks for any guidance!

4

1 回答 1

0

1) Yes, pretty much the same arguments apply to storing PDFs and whatnot... anything that's compressed also comes to mind.

Every file format that's non-text has to deal with the question of endianness if it wants to be portable across hosts of different endianness. They mostly do it by defining what the endianness of all binary fields within the file that are longer than one byte should be. Software that writes and reads the format than has to take special care to byte-swap iff it's running on a platform of the opposite endianness. Images are no different than other binary file formats. The choice is arbitrary, but big endian (network byte order) is a popular choice especially with network software because of the ubiquity of macros in C that deal with this almost automatically.

Another way of defining binary file formats so that they are endian-portable is to support either endianness for binary fields, and include a marker in the header to say which one was used. On opening the file, readers consult the marker. That way the file can be read back slightly more efficiently on the same host where it was written or other hosts with the same endianness (which is the common case) while hosts of the opposite endianness need to expend a little bit more effort.

As for the database, assuming you are using a field type like a blob, you'll get back the very same bytestream when you read as whatever you wrote, so you don't have to worry about the endianness of the database client or server.

2) That depends on the database. The database might use an underlying on-disk format that is compatible with any endianness, by defining its on-disk format as described above.

Databases aren't often aiming for portability of their underlying file formats though, considering (correctly) that moving the underlying data files to a database host of different endianness is rare. According to this answer, for example, MySQL's MyISAM is not endian-portable.

I don't think you need to worry about this too much though. If the database server is ever switched to a host of different endianness, ensuring that the data remains readable is an important step of the process and the DBA handling the task (perhaps yourself?) won't forget to do it, because if they do forget, then nothing will work (that is, the breakage won't be limited to binary BLOBs!)

于 2013-05-11T16:33:36.690 回答