我使用 Amazon DynamoDB 收集统计数据,使用 ElasticMapReduce 和 Hive 处理统计数据并将结果上传到 S3。
在 DynamoDB 上,我有表 prod_product_views:-id(哈希键)-product_id(范围键)-company_id-creted-price-viewed_by_company_id-viewed_by_user_id
目前在此表中大约有 7000 条记录。
问题是 hiveql 运行速度很慢。
例如,我首先要创建存储在 DynamoDB 上的外部表:
CREATE EXTERNAL TABLE prod_product_views (id string, product_id bigint, company_id bigint, created bigint, price string, viewed_by_company_id bigint, viewed_by_user_id bigint)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "prod_product_views",
"dynamodb.column.mapping" = "id:id,product_id:product_id,company_id:company_id,created:created,price:price,viewed_by_company_id:viewed_by_company_id,viewed_by_user_id:viewed_by_user_id");
这一步没问题(耗时:12.908秒)
第二步是获取最后一天的视图:
SELECT * from prod_product_views
WHERE
created > UNIX_TIMESTAMP(CONCAT(DATE_SUB(FROM_UNIXTIME(UNIX_TIMESTAMP()), 1)," ","00:00:00"))
and created < UNIX_TIMESTAMP(CONCAT(DATE_SUB(FROM_UNIXTIME(UNIX_TIMESTAMP()), 1)," ","23:59:59"));
这一步需要很长时间(大约 60 分钟),也许更多。
这是输出的一部分:
2013-05-23 08:23:06,097 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:07,103 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:08,109 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:09,115 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:10,121 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:11,147 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:12,153 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:13,160 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:14,169 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:15,177 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:16,183 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:17,193 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:18,219 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:19,225 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:20,234 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:21,240 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:22,247 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:23,253 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:24,259 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:25,265 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:26,273 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:27,279 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:28,290 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:29,312 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:30,318 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:31,324 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:32,333 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:33,358 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:34,364 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:35,394 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:36,400 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:37,408 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:38,418 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:39,478 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:40,538 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:41,544 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:42,550 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:43,557 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:44,563 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:45,569 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:46,579 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.39 sec
2013-05-23 08:23:47,607 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:48,613 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:49,623 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:50,633 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:51,638 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:52,650 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:53,657 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:54,665 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:55,691 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
2013-05-23 08:23:56,697 Stage-1 map = 0%, reduce = 0%, Cumulative CPU 3.76 sec
我是这种服务的新手,是我做错了什么,还是在配置方面有一些技巧或什么可以加快速度?因为这看起来像一个简单的查询,而 7000 条记录并不是大量的数据。
提前致谢!