c++ - 以二进制模式写入文件流上文件的最小数据单元的大小是否始终为 8 位？

Question

正如我上面提到的，
1-)以二进制模式写入文件流上文件的最小数据单元的大小是否总是8位？如果它将函数传递的任何字符写入文件put()，我们可以说它总是8位吗？

2-）如果我们将一个添加integer到一个类型的变量中，无论在哪个平台/机器上尝试使用哪个平台/机器，char变量在字符集中的位置是否会随着integer添加的数量而改变，而不管该类型变量的位在内存中是如何表示的？char如果我们超出了变量在任何具有signed或unsigned char表示char类型？添加时它是否总是从头到尾返回并进行相反的提取？

3-)我到底想知道是否有一种可移植的方式来以二进制模式将数据存储在文件中，以及如何通过读取和写入来毫无问题地操作常见的文件格式。
谢谢。

score 3 · Accepted Answer

1) C++ 标准很清楚，一个“字节”（或char）不一定是 8 位，一方面。尽管具有 9 位或 12 位char类型的机器不是很常见，但如果您想要极高的可移植性，您需要以某种方式考虑到这一点（例如，指定“我们的实现期望 achar为 8 位 - 当然可以检查在编译或运行时，例如：

#if (CHAR_BITS != 8) #error This implementation requires char_bits == 8. #endif

或者

if (CHAR_BITS != 8)
{
    cerr << "Sorry, can't run on this platform, CHAR_BITS is not 8\n";
    exit(2);
}

2）向值添加int值char会将其转换为int- 如果您随后将其转换回 a char，它应该是一致的，是的。尽管在技术上对于正值和负值之间的溢出行为是“未定义的”，这可能会在某些机器上导致奇怪的事情（例如溢出陷阱）。

3）只要明确定义和记录，二进制格式就可以在便携式场景中很好地工作。参见“JPG”、“PNG”和某种程度的“BMP”作为二进制数据“非常便携”的示例。我不确定在具有 36 位机器字的 DEC-10 系统上显示 JPG 效果如何。

score 2 · Accepted Answer

1) No, the smallest unit of allocation is a disk page, as defined by the filesystem parameters. With most modern file systems, this is 4k, though some next-gen file systems exceptionally small files' content can be stored in the inode, so the content itself takes no extra space on the disk. FAT and NTFS page sizes range from 4k to 64k depending on how the disk was formatted.

1a) "smallest read/write" unit is usually an 8-bit byte, though on some oddball systems use different byte sizes (CDC cyber comes to mind with a 12-bit byte). I can't think of any modern systems that use anything other than an 8-bit byte.

2) adding an integer to a char will result in a size integer result. The compiler will implicitly promote the char to integer before the arithmetic. This can then be downcast (by truncation, usually) to a char.

3) Yes and yes. You have to thoroughly document the file formats, including endianness of words if you plan to be running on different CPU architectures (i.e. Intel is little-ended, motorola is big-ended, and some supercomputers are weirdly ended). These different architectures will read and write words and dwords differently, and you may have to account for that in your reader code.

3a) This is fairly common (though now with XML and other self-defining semistructured formats perhaps less so), and so long as the documentation is complete, there are few issues in reading or writing these files.

c++ - 以二进制模式写入文件流上文件的最小数据单元的大小是否始终为 8 位？

2 回答 2

Related

Reference