40

我正在比较 JSON 和 BSON 来序列化对象。这些对象包含多个包含大量整数的数组。在我的测试中,我正在序列化的对象总共包含大约 12,000 个整数。我只对序列化结果的大小比较感兴趣。我使用 JSON.NET 作为进行序列化的库。我使用 JSON 是因为我也希望能够在 Javascript 中使用它。

JSON 字符串的大小约为 43kb,BSON 结果的大小为 161kb。所以差异因子约为 4。这不是我所期望的,因为我查看了 BSON,因为我认为 BSON 在存储数据方面更有效。

所以我的问题是为什么 BSON 效率不高,它可以变得更有效率吗?或者是否有另一种使用包含大量整数的数组序列化数据的方法,这可以在 Javascript 中轻松处理?

下面是测试 JSON/BSON 序列化的代码。

        // Read file which contain json string
        string _jsonString = ReadFile();
        object _object = Newtonsoft.Json.JsonConvert.DeserializeObject(_jsonString);
        FileStream _fs = File.OpenWrite("BsonFileName");
        using (Newtonsoft.Json.Bson.BsonWriter _bsonWriter = new BsonWriter(_fs) 
               { CloseOutput = false })
        {
            Newtonsoft.Json.JsonSerializer _jsonSerializer = new JsonSerializer();
            _jsonSerializer.Serialize(_bsonWriter, _object);
            _bsonWriter.Flush();
        }

编辑:

这是生成的文件 https://skydrive.live.com/redir?resid=9A6F31F60861DD2C!362&authkey=!AKU-ZZp8C_0gcR0

4

1 回答 1

69

JSON 与 BSON 的效率取决于您存储的整数的大小。有一个有趣的地方,即 ASCII 占用的字节数比实际存储整数类型要少。64 位整数,即 BSON 文档的显示方式,占用 8 个字节。您的数字都小于 10,000,这意味着您可以将每个数字存储为 4 个字节的 ASCII 码(每个字符一个字节,直到 9999)。事实上,您的大部分数据看起来都小于 1000,这意味着它可以存储在 3 个或更少的字节中。当然,这种反序列化需要时间并且成本不高,但它节省了空间。此外,Javascript 使用 64 位值来表示所有数字,因此如果您在将每个整数转换为更合适的数据格式后将其写入 BSON,您的 BSON 文件可能会更大。

根据规范,BSON 包含许多 JSON 没有的元数据。此元数据主要是长度前缀,因此您可以跳过不感兴趣的数据。例如,获取以下数据:

["hello there, this is an necessarily long string.  It's especially long, but you don't care about it. You're just trying to get to the next element. But I keep going on and on.",
 "oh man. here's another string you still don't care about.  You really just want the third element in the array.  How long are the first two elements? JSON won't tell you",
 "data_you_care_about"]

现在,如果您使用 JSON,则必须解析前两个字符串的全部内容以找出第三个字符串的位置。如果你使用 BSON,你会得到更像这样的标记(但实际上不是,因为我制作这个标记是为了举例):

[175 "hello there, this is an necessarily long string.  It's especially long, but you don't care about it. You're just trying to get to the next element. But I keep going on and on.",
 169 "oh man. here's another string you still don't care about.  You really just want the third element in the array.  How long are the first two elements? JSON won't tell you",
 19 "data_you_care_about"]

所以现在,您可以读取“175”,知道向前跳过 175 个字节,然后读取“169”,向前跳过 169 个字节,然后读取“19”并将接下来的 19 个字节复制到您的字符串中。这样,您甚至不必为分隔符解析字符串。

Using one versus the other is very dependent on what your needs are. If you're going to be storing enormous documents that you've got all the time in the world to parse, but your disk space is limited, use JSON because it's more compact and space efficient. If you're going to be storing documents, but reducing wait time (perhaps in a server context) is more important to you than saving some disk space, use BSON.

Another thing to consider in your choice is human readability. If you need to debug a crash report that contains BSON, you'll probably need a utility to decipher it. You probably don't just know BSON, but you can just read JSON.

FAQ

于 2012-09-26T22:49:29.987 回答