json - Can I get more explanations for BSON?

Question

I am trying to understand BSON via http://bsonspec.org/#/specification, but still some questions remain.

let's take an example from the web site above:

{"hello": "world"} → "\x16\x00\x00\x00\x02hello\x00\x06\x00\x00\x00world\x00\x00"

Question 1

in the above example, for the encoded bytes results, the double quotes actually are not part of the results, right?

Question 2

I understand that the first 4 bytes \x16\x00\x00\x00 is the size of the whole BSON doc.

And it is little endian format. But why? Why not take big endian?

Question 3

How comes the size of the example doc being \x16, i.e. 22?

Question 4

Normally, if I want to encode the doc by myself, how do I calculate the size of the doc? I think my trouble majorly is how to decide the size of UTF-8 string?

Let's take another example:

{"BSON": ["awesome", 5.05, 1986]}   

→   

"\x31\x00\x00\x00\x04BSON\x00\x26\x00\x00\x00\x020\x00\x08\x00\x00 
 \x00awesome\x00\x011\x00\x33\x33\x33\x33\x33\x33\x14\x40\x102\x00\xc2\x07\x00\x00 
 \x00\x00"

Question 5

In this example, there is an array. according to the specification, for array, it is actually a list of {key, value} pairs, whereas the key is 0, 1, etc. My question is so the 0, 1 here are strings too, right?

score 2 · Accepted Answer

问题 1

在上面的例子中，对于编码的字节结果，双引号实际上不是结果的一部分，对吧？

引号不是字符串的一部分。它们用于标记 JSON 字符串

问题2

它是小端格式。但为什么？为什么不采用大端序？

字节序的选择很大程度上是一个偏好问题。little endian 的一个优点是常用的平台是 little endian，因此不需要反转字节。

问题 3

示例文档的大小如何为 \x16，即 22？

有22个字节（包括长度前缀）

问题 4

通常，如果我想自己编码文档，我如何计算文档的大小？我认为我的问题主要是如何确定 UTF-8 字符串的大小？

先把文件写出来，再回去填长度。

问题 5

在这个例子中，有一个数组。根据规范，对于数组，它实际上是一个 {key, value} 对的列表，而键是 0、1 等。我的问题是这里的 0、1 也是字符串，对吧？

是的。准确地说，没有长度前缀的零终止字符串。cstring（在列表中调用）。就像嵌入的文档一样。

json - Can I get more explanations for BSON?

1 回答 1

Related

Reference