我将 DynamoDB 表导出到 s3 作为备份方式(通过 EMR)。导出时,我将数据存储为 lzo 压缩文件。我的配置单元查询如下,但基本上我遵循了http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/EMR_Hive_Commands.html上的“使用数据压缩将 Amazon DynamoDB 表导出到 Amazon S3 存储桶”
我现在想做相反的事情 - 拿我的 LZO 文件并将它们放回蜂巢表中。你怎么做到这一点?我期待看到一些用于输入的配置单元配置属性,但没有。我用谷歌搜索并找到了一些提示,但没有确定的,也没有任何有效的。
s3 中的文件格式为:s3://[mybucket]/backup/year=2012/month=08/day=01/000000.lzo
这是我进行导出的 HQL:
SET dynamodb.throughput.read.percent=1.0;
SET hive.exec.compress.output=true;
SET io.seqfile.compression.type=BLOCK;
SET mapred.output.compression.codec = com.hadoop.compression.lzo.LzopCodec;
CREATE EXTERNAL TABLE hiveSBackup (id bigint, periodStart string, allotted bigint, remaining bigint, created string, seconds bigint, served bigint, modified string)
STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler'
TBLPROPERTIES ("dynamodb.table.name" = "${DYNAMOTABLENAME}",
"dynamodb.column.mapping" = "id:id,periodStart:periodStart,allotted:allotted,remaining:remaining,created:created,seconds:seconds,served:served,modified:modified");
CREATE EXTERNAL TABLE s3_export (id bigint, periodStart string, allotted bigint, remaining bigint, created string, seconds bigint, served bigint, modified string)
PARTITIONED BY (year string, month string, day string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION 's3://<mybucket>/backup';
INSERT OVERWRITE TABLE s3_export
PARTITION (year="${PARTITIONYEAR}", month="${PARTITIONMONTH}", day="${PARTITIONDAY}")
SELECT * from hiveSBackup;
任何想法如何从 s3 中获取它,解压缩并进入配置单元表?