0

我使用 Amazon DynamoDB 收集统计数据,使用 ElasticMapReduce 和 Hive 处理统计数据并将结果上传到 S3。

在 DynamoDB 上,我有表 prod_product_views:-id(哈希键)-product_id(范围键)-company_id-creted-price-viewed_by_company_id-viewed_by_user_id

目前在此表中大约有 7000 条记录。

问题是 hiveql 运行速度很慢。

例如,我首先要创建存储在 DynamoDB 上的外部表:

CREATE EXTERNAL TABLE prod_product_views (id string, product_id bigint, company_id bigint, created bigint, price string, viewed_by_company_id bigint, viewed_by_user_id bigint)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "prod_product_views",
"dynamodb.column.mapping" = "id:id,product_id:product_id,company_id:company_id,created:created,price:price,viewed_by_company_id:viewed_by_company_id,viewed_by_user_id:viewed_by_user_id"); 

这一步没问题(耗时:12.908秒)

第二步是获取最后一天的视图:

SELECT * from prod_product_views
WHERE 
created > UNIX_TIMESTAMP(CONCAT(DATE_SUB(FROM_UNIXTIME(UNIX_TIMESTAMP()), 1)," ","00:00:00")) 
and created < UNIX_TIMESTAMP(CONCAT(DATE_SUB(FROM_UNIXTIME(UNIX_TIMESTAMP()), 1)," ","23:59:59")); 

这一步需要很长时间(大约 60 分钟),也许更多。

这是输出的一部分:

2013-05-23 08:23:06,097 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:07,103 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:08,109 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:09,115 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:10,121 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:11,147 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:12,153 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:13,160 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:14,169 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:15,177 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:16,183 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:17,193 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:18,219 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:19,225 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:20,234 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:21,240 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:22,247 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:23,253 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:24,259 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:25,265 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:26,273 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:27,279 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:28,290 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:29,312 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:30,318 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:31,324 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:32,333 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:33,358 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:34,364 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:35,394 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:36,400 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:37,408 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:38,418 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:39,478 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:40,538 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:41,544 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:42,550 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:43,557 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:44,563 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:45,569 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:46,579 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:47,607 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:48,613 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:49,623 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:50,633 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:51,638 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:52,650 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:53,657 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:54,665 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:55,691 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:56,697 Stage-1 map = 0%,  reduce = 0%, Cumulative CPU 3.76 sec

我是这种服务的新手,是我做错了什么,还是在配置方面有一些技巧或什么可以加快速度?因为这看起来像一个简单的查询,而 7000 条记录并不是大量的数据。

提前致谢!

4

1 回答 1

0

我想通了,这在蜂巢中没有问题。它在 DynamoDB 吞吐量容量中的问题,因为它被设置为读取 1/写入 1。现在我增加读取 10/写入 5 的容量并完成 16 秒 :)

于 2015-05-14T13:37:32.260 回答