我一直在努力理解 Cassandra 查询的工作方式,因为它们似乎没有达到我的预期。
这是我正在使用的当前表:
fields: {
stats_customer_id: {
type: 'uuid',
default: {
'$db_function': 'uuid()'
}
},
stats_customer_id_old: 'text',
stats_date_id: {
type: 'timeuuid',
default: {
'$db_function': 'now()'
}
},
provider_id: 'uuid',
customer_id: 'uuid',
customer_name: 'text',
customer_account_no: 'text',
direct_sent: 'int',
messages_sent: 'int',
reminders_sent: 'int',
reminders_pending: 'int',
replies_sent: 'int',
binary_sent: 'int',
},
key: [
[
'provider_id'
],
'stats_date_id',
'customer_id'
]
注意:我 100% 乐于修改甚至完全丢弃此表以获得以下结果。
我的查询可以描述为:
For a given provider_id and date range (to and from date),
return a list of Customers (distinct) with a sum of each int field
(direct_sent, messages_sent, reminders_sent, reminders_pending, binary_sent).
在使用 select 语句、group_by 和其他东西尝试了几种方法后,我总是返回一个客户,该客户似乎包含给定日期范围内所有客户的总和。
当前查询的示例,使用 express-cassandra npm 库,如下所示:
let query = {
provider_id: providerId,
'$groupby': ['customer_id']
};
if (fromDate && toDate) {
query.stats_date_id = {
'$gte': models.minTimeuuid(fromDate),
'$lte': models.maxTimeuuid(toDate)
};
}
let selectQueries = [
'provider_id',
'customer_id',
'customer_name',
'sum(direct_sent) as direct_sent',
'sum(messages_sent) as messages_sent',
'sum(reminders_sent) as reminders_sent',
'sum(reminders_pending) as reminders_pending',
'sum(replies_sent) as replies_sent',
'sum(binary_sent) as binary_sent',
];
// Query stats_customer table
let customerData = await models.instance.StatsCustomer.findAsync(query, {select: selectQueries});
return customerData;
我还需要能够以每天 1k 到 100k 个条目的速度将数据插入到这个表中。
我假设我误解了 Cassandra 的一些相当基本的东西,导致这种行为发生。和以前一样,为了满足上述措辞查询的要求,我完全乐意重新编写甚至删除该表。
提前致谢。