google-app-engine - 应用引擎数据存储中的邮政地址模型，公共属性应该如何结构化？

Question

假设我们有基于 2 个模型的数百万个地址。

Address模型具有纯字符串属性，即使对于常见的属性，如county：

class Address(ndb.Model):

  house_no = ndb.StringProperty()
  street = ndb.StringProperty()
  locality = ndb.StringProperty() # City/town
  county = ndb.StringProperty()
  zipcode = ndb.StringProperty()

StructuredAddressmodel 通过将每个属性定义为 a 来保留更常见的属性作为对其他模型的引用KeyProperty：

class StructuredAddress(ndb.Model):

  house_no = ndb.StringProperty()
  street = ndb.StringProperty()
  locality = ndb.KeyProperty(kind=Locality) # City/town
  county = ndb.KeyProperty(kind=County)
  zipcode = ndb.KeyProperty(kind=Zipcode)

以下是问题：

当基于常见属性进行查询时，哪种模型更有效zipcode？
假设county属性的数量约为 50，而zipcode属性的数量约为数百万。给定数百万个地址记录，在这种情况下哪种模型更有效？
KeyProperty在这个例子中使用是否意味着更多的读取操作，并且实际上更高的账单？内置的 ndb 缓存会避免这种情况吗？

score 2 · Accepted Answer

KeyProperty 版本会更贵，因为 Key 占用的字节数比典型的邮政编码或城镇/县名要多。（每个键都重复它所指向的种类的全名。）

除了被动存储成本之外，您还需要为读取键引用的字段支付额外的读取成本。

最后，没有办法直接执行您需要执行这些查询的 JOIN（尽管这可能只是一次查找的问题）。

The only thing that using keys buys you is the possibility of changing the name of a town or county. But how often does that really happen?

score 1 · Accepted Answer

根据邮政编码等常见属性进行查询时，哪种模型更有效？

假设 ZipCode 类只包含一个带有邮政编码的 String/Int 属性，(1) 将使用一个 RPC 完成此查询，(2) 将使用两个 RPC：

(1)

# Get the first 100 adresses with zipcode 55555
addresses = Address.query().filter('zipcode','55555').fetch(limit=100)

(2)

# Get the key of the zipcode 55555
zip = Zipcode.query().filter('code','55555').get()
# Get the first 100 addresses with the zipcode 55555
addresses = StructuredAddress.query().filter('zipcode',zip.key()).fetch(limit=100)

所以这里（1）是优越的。

假设国家属性的数量约为 50，而邮政编码属性的数量约为数百万。给定数百万个地址记录，在这种情况下哪种模型更有效？

再次假设只有一个字符串与邮政编码相关联，并且通过效率您正在谈论存储效率（1）您只需存储数百万个地址，在（2）您将必须存储数百万个地址和数百万个邮政编码，因此（1）将更有效率。

同样，（1）将是优越的。

在这个例子中使用 KeyProperty 是否意味着更多的读取操作，并且实际上更高的账单？内置的 ndb 缓存会避免这种情况吗？

简而言之，是的，正如您对第一个问题的回答所证明的那样。实际上，您想要使用 KeyProperty 的唯一时间是当有多个字段将存储在参考模型中时。

google-app-engine - 应用引擎数据存储中的邮政地址模型，公共属性应该如何结构化？

2 回答 2

Related

Reference