我想在不使用任何 mapreduce 程序以及 HIVEQL/pig 支持的情况下将 JSON 或 CSV 加载到 HBASE 中,是否有可能,哪一个是更有效的 hive-hbase 或 mapreduce-hbase。
问问题
2598 次
2 回答
1
我使用 Perl 脚本来执行此操作;
这是我的(perl 生成的)JSON 文件
{"c3":"c","c4":"d","c5":"tim","c2":"b","c6":"andrew","c1":"a"},"CURRENTLY20140131":{"c2":"tim2","c1":"bill2"},"THERE20140131"::{"c3":"c","c4":"d","c9":"bill2","c10":"tim2","c2":"b","c6":"andrew","c7":"bill","c5":"tim","c1":"a","c8":"tom"},"TODAY20140131":{"c2":"bill","c1":"tom"}}
我在 STRING 上进行分片,多列取决于谁/什么引用了关键对象。
use strict;
use warnings;
use Data::Dumper;
use JSON::XS qw(encode_json decode_json);
use File::Slurp qw(read_file write_file);
my %words = ();
my $debug = 0;
sub ReadHash {
my ($filename) = @_;
my $json = read_file( $filename, { binmode => ':raw' } );
%words = %{ decode_json $json };
}
# Main Starts here
ReadHash("Save.json");
foreach my $key (keys %words)
{
printf("put 'test', '$key',");
my $cnt=0;
foreach my $key2 ( keys %{ $words{$key} } ) {
my $val = $words{$key}{$key2};
print "," if $cnt>0;
printf("'cf:$key2', '$val'");
++$cnt;
}
print "\n";
}
生成 Hbase 命令,然后执行它们。
Alternativly - 我会看一下happybase(Python),它也可以非常快速地加载大型数据集。
希望这可以帮助
这应该产生像.....
put 'test', 'WHERE20140131','cf:c2', 'bill2','cf:c1', 'tim2'
put 'test', 'OMAN20140131','cf:c3', 'c','cf:c4', 'd','cf:c5', 'tim','cf:c2', 'b','cf:c1', 'a','cf:c6', 'andrew'
put 'test', 'CURRENTLY20140131','cf:c2', 'tim2','cf:c1', 'bill2'
于 2014-01-31T15:59:32.760 回答
0
也许你可以参考批量加载。链接在这里。 散装
于 2013-09-05T04:22:48.280 回答