1

我想将每秒一个值存储到一个表中。因此,我测试了两个相互对抗的方法。如果我理解正确,数据应该在内部存储几乎相同。

宽排

CREATE TABLE timeseries (
  id int,
  date date,
  timestamp timestamp,
  value decimal,
  PRIMARY KEY ((id, date), timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC) AND
  compaction={'class': 'DateTieredCompactionStrategy'} 
   and  compression = { 'sstable_compression' : 'DeflateCompressor' };

瘦排

  CREATE TABLE timeseries(
    id int,
    date date,
    "0" decimal, "1" decimal,"2" decimal, -- ... 86400 decimal values
                   -- each column index is the second of the day
    PRIMARY KEY ((id, date))
) 

测试:

  • 10个不同的ID
  • 100 万个值(每个 id 100.000 个)
  • 每个值增加一分钟

值比较的结果


在此处输入图像描述

在我的测试中,用于 sinus 函数的瘦行方法仅消耗 100 万个值的一半存储空间。即使是随机测试也很重要。有人可以解释这种行为吗?

4

1 回答 1

2

The only difference between these schema is the cell key

A sample cell of The wide row model :

["2017-06-09 15\\:05+0600:value","3",1496999149885944]
          |                 |     |          |
       timestamp         column  value   timestamp

And A sample cell of the Skinny row model :

   ["0","3",1497019292686908]
     |   |          | 
  column value   timestamp

You can clearly see that wide row model cell key is timestamp value and column name of value. And for skinny model cell key is only column name.

The overhead of wide row model is the timestamp(8 bytes) and the size of column name (value).you can keep the column name small and instead of using timestamp, use int and put the seconds of the day, like your skinny row column name. This will save more space.

于 2017-06-09T15:30:57.303 回答