1

我正在尝试使用 CDH ycsb 包裹在 HBase 上运行 ycsb。我正在关注 Cloudera 的博客ycsb-the-open-standard-for-nosql-benchmarking-joins-cloudera-labs。在此博客中,我看到以下命令,

hbase(main):001:0> n_splits = 200 # HBase recommends (10 * number of regionservers)

hbase(main):002:0> create 'usertable', 'cf', {SPLITS => (1..n_splits).map {|i| "user#{1000+i*(9999-1000)/n_splits}"}}.

这是什么{SPLITS => (1..n_splits).map {|i| "user#{1000+i*(9999-1000)/n_splits}"}意思?我知道这被用于分割区域。但我无法找到上述命令实际上在做什么?请帮助我更好地理解这一点。

4

2 回答 2

0

As you already mentioned, the command is pre-splitting the table in regions. That is recommended for YCSB because the performance test loads a lot of data that would land in a single region server and would end up in a poor evaluation as data wouldn't be distributed across the cluster.

The ideal number of splits depends on other factors. I'm unsure why they chose that formula but I'm guessing it is because of the workload examples.

You can also run the command on a Ruby online tool and check the results yourself:

user1044
user1089
user1134
user1179
...
user9999
于 2020-11-05T03:25:14.830 回答
0

这是这里所说的红宝石http://hbase.apache.org/book.html#shell

Apache HBase Shell 是 (J)Ruby 的 IRB,添加了一些 HBase 特定命令。任何你可以在 IRB 中做的事情,你都应该可以在 HBase Shell 中做。

所以你首先声明一个拆分数变量

hbase(main):001:0> n_splits = 200 # HBase recommends (10 * number of regionservers)

然后使用 ruby​​ 语法生成一个数组作为“create”命令的 SPLITS 参数

hbase(main):002:0> create 'usertable', 'cf', {SPLITS => (1..n_splits).map {|i| "user#{1000+i*(9999-1000)/n_splits}"}}

甚至命令“create”本身也是一个 ruby​​ 函数。你可以在 $HBASE_HOME/lib/ruby/shell/commands 找到定义

于 2019-08-22T08:23:41.833 回答