0

Like many, I'm no new the NoSQL world. I did a lot of research, but I still lack only one point, which I can't find proper answer for.

Short description of system:

I'm building a system that collects Visitor's data on different websites. Each visit is an Entity in the datastore, with properties like device type, IP, time of visit..etc.

There will be millions of visits in the datastore.

My Question, is how do I serve this data to clients. My Data is setting in the datastore as "Visit" entities.

Now when a customer logs in, I don't want to show them millions of records. I want for example to show them general stats. Like number of visits on mobile device, number of visits from specific country in some time range, and stuff like that.

Now since I'm new to the NoSQL databases, I'm not sure how I should go around showing these stats in the clients' dashboard.

As I know, Datastore has no support for aggregates, or getting count of query results for example.

I looked at BigQuery, but BigQuery works on Datastore "backups", I need to serve data in real time, without needing to do backups manually.

Also I read about counters, and sharding counters, is this the proper approach? have a counter for each client for each property for each tracking group? and show the total numbers this way? Sounds like too much for a simple purpose.

Any input or explanation that can get me in the right direction would be highly appreciated.

Best Regards

4

2 回答 2

0

是的,就性能而言,计数器是解决问题的好方法。但是它们确实有一些缺点,例如存储大小以及每次您想要引入一种新类型的统计信息时,您都需要为其创建一个计数器。

除了您当前的“访问”实体之外,您还可以选择将聚合数据存储在数据存储区的分片计数器中。这些计数器可以实时更新,也可以通过其中一个任务队列中的任务进行更新。创建一个任务来为当前的访问实体创建各种计数器是相当简单的。

分片是一种创建多个“基础”实体的方法,这些实体在组合时代表一些有意义的数据。进行分片以确保不会因并发更新而导致性能问题。

来自谷歌文档:

如果你有一个单一的实体作为计数器并且更新速率太快,那么你就会发生争用,因为序列化的写入会堆积起来并开始超时。如果您来自关系数据库,则解决此问题的方法有点违反直觉;该解决方案依赖于这样一个事实,即从 App Engine 数据存储区读取数据非常快速且成本低廉。减少争用的方法是建立一个分片计数器——将计数器分成 N 个不同的计数器。当你想增加计数器时,你随机选择一个分片并增加它。当您想知道总计数时,您可以读取所有计数器分片并总结它们各自的计数。您拥有的分片越多,计数器增量的吞吐量就越高。

我建议查看链接以获取更多信息和一些有用的示例。

于 2016-07-12T09:06:05.847 回答
0

据我所知,Datastore 不支持聚合或获取查询结果的计数。

这不是真的。您可以使用一行代码获取查询返回的多个实体。查询本身可以是key-only,速度非常快,基本上是免费的。

于 2016-04-29T00:29:50.503 回答