google-app-engine - 什么属性 str 或 int 将在 Google App Engine（包括索引）中占据较少的位置

Question

我想以列表的形式构建具有属性的模型。

class Data(db.Model):
    listOfIntergers = db.ListProperty(int)
    listOfStrings = db.ListProperty(str)

在这两个列表中，我都希望将值列表保留在 <0, 255> 范围内。

什么属性类型会占用更少的空间，包括数据存储中的索引 listOfIntergers 或 listOfStrings？

考虑 str 可以是字符或编码的十六进制值，即 0 = 0, 255 = ff。

score 3 · Accepted Answer

当心！首先，整数使用可变长度编码进行编码，因此并非所有 int64 值都占用 8 个字节——小的值占用 1-2 个字节。但是，列表属性的表示会重复列表中每个值的属性名称。我使用 NDB 重写了您的示例（因为用于从中获取编码的 protobuf 的 API 更容易记住）并发现以下内容：

>>> from ndb import *
>>> class Data(Model):
...   listOfIntegers = IntegerProperty(repeated=True)
...   listOfStrings = StringProperty(repeated=True)
... 
>>> len(Data()._to_pb().Encode())
20
>>> len(Data(listOfIntegers=[0]*100)._to_pb().Encode())
2420
>>> len(Data(listOfIntegers=[255]*100)._to_pb().Encode())
2520
>>> len(Data(listOfStrings=['\x00']*100)._to_pb().Encode())
2420

但请注意其欺骗性：100 个值为 0 的整数每个值占用 24 个字节（其中大部分是属性名称“listOfIntegers”），但 100 个值为 255 的整数每个值占用 25 个字节！这就是适合您的可变长度编码。100 个字符串值 '\x00' 每个值也占用 24 个字节——但请注意，'listOfStrings' 比 'listOfIntegers' 短一个字符，因此 1 字节字符串占用的字节数比整数 0 多 1 个字节，数量相同空间为整数 255。

除非您确实需要对该字段进行索引，否则正确的解决方案可能是声明一个 BlobProperty（不是列表/重复属性）并存储由所需字节数组成的单个字符串。

>>> class Data(Model):
...  b = BlobProperty()
... 
>>> d = Data(b=''.join(map(chr, range(100))))
>>> len(d._to_pb().Encode())
133
>>>

对于相同数量的信息，这几乎是紧凑的 20 倍！（诚然，您可以通过使用更短的属性名称来降低列表属性版本，但它仍然会更长。）

google-app-engine - 什么属性 str 或 int 将在 Google App Engine（包括索引）中占据较少的位置

1 回答 1

Related

Reference