cassandra - 面向列的数据库相关

Question

伙计们，

我目前已经开始阅读有关 NOSQL 相关数据库的信息，因为目前正在研究与数据库仓库相关的应用程序。

我有以下问题。我已经阅读了基础知识。

问题 1）当具有相同列的数据存储在一起时，如何在面向列的数据库中检索整个原始数据？

假设我们以以下格式存储数据，因此在内部它将像这样存储在面向列的数据库中。

test|test1 一起和 5|10 一起。

键 1：{名称：测试，值：5} 键 2：{名称：test1，值：10}

因此，如果我们必须检索 key1 的数据，它是如何发生的？（A和B是我的猜测）

A）如果它必须分别从每个列存储中提取数据，那么成本将非常高

B）是否有任何索引机制来获取给定原始键的所有列的数据？

问题2 ）

我正在阅读一些文档，发现面向列的数据库更适合在单列上运行聚合函数，因为 I/O 会更少。

在 cassandra 和 HBASE 等 NOSQL 列式存储中，我没有找到对 SUM、AVG 等聚合函数的适当支持。（可能会有一些调整/黑客/更多代码编写，如下所示）

Apache Cassandra 如何进行聚合操作？实时查询/聚合数百万条记录-hadoop？hbase？卡珊德拉？如何使用hbase协处理器实现groupby？

问题 3) 连接如何在面向列的数据库中内部发生是可取的吗？

score 0 · Accepted Answer

好问题，1）在 Cassandra 中，如果您使用的是 cqlsh，那么它看起来就像您将数据存储在 mysql 或其他一些 rdbms 存储中一样。

Connected to Test Cluster at localhost:9160.
[cqlsh 3.1.7 | Cassandra 1.2.9 | CQL spec 3.0.0 | Thrift protocol 19.36.0]
Use HELP for help.
cqlsh> create keyspace test with replication={'class':'SimpleStrategy', 'replication_factor': 1
         <value>  
cqlsh> create keyspace test with replication={'class':'SimpleStrategy', replication_factor': 1};
cqlsh> USE test ;
cqlsh:test> create table entry(key text PRIMARY KEY, name text, value int );
cqlsh:test> INSERT INTO entry (key, name , value ) VALUES ( 'key1', 'test',5);
cqlsh:test> INSERT INTO entry (key, name , value ) VALUES ( 'key2', 'test1',10);
cqlsh:test> select * from entry;

 key  | name  | value
------+-------+-------
 key1 |  test |     5
 key2 | test1 |    10

cqlsh:测试>

注意：-您可以使用键选择行，也可以使用二级索引在其他列上使用某些条件。

但在 hbase 中，结构将如下所示

rowkey | column family | column | value
key1   | entry         | name   | test
key1   | entry         | value  | 5
key2   | entry         | name   | test1
key2   | entry         | value  | 10

注意：- 您可以使用键或任何列值选择每一行，这非常容易。

2) 是的 nosqls 也只支持 DML 的批处理操作。

3) nosqls 数据存储都不支持连接。它们不适用于连接。

希望它会帮助你。

cassandra - 面向列的数据库相关

1 回答 1

Related

Reference