4

我需要将数据从文本文件加载到 Map Reduce,我已经在网上搜索过,但我没有找到适合我工作的任何解决方案。

是否有任何方法或类可以从系统读取文本 /csv 文件并将数据存储到 HBASE 表中。

4

1 回答 1

2

要从文本文件中读取,首先文本文件应该在 hdfs 中。您需要为作业指定输入格式和输出格式

Job job = new Job(conf, "example");
FileInputFormat.addInputPath(job, new Path("PATH to text file"));
job.setInputFormatClass(TextInputFormat.class);
job.setMapperClass(YourMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
TableMapReduceUtil.initTableReducerJob("hbase_table_name", YourReducer.class, job);
job.waitForCompletion(true);

YourReducer应该扩展org.apache.hadoop.hbase.mapreduce.TableReducer<Text, Text, Text>

示例减速器代码

public class YourReducer extends TableReducer<Text, Text, Text> {    
private byte[] rawUpdateColumnFamily = Bytes.toBytes("colName");
/**
* Called once at the beginning of the task.
*/
@Override
protected void setup(Context context) throws IOException, InterruptedException {
// something that need to be done at start of reducer
}

@Override
public void reduce(Text keyin, Iterable<Text> values, Context context) throws IOException, InterruptedException {
// aggregate counts
int valuesCount = 0;
for (Text val : values) {
   valuesCount += 1;
   // put date in table
   Put put = new Put(keyin.toString().getBytes());
   long explicitTimeInMs = new Date().getTime();
   put.add(rawUpdateColumnFamily, Bytes.toBytes("colName"), explicitTimeInMs,val.toString().getBytes());
   context.write(keyin, put);


      }
    }
}

示例映射器类

public static class YourMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    String line = value.toString();
    StringTokenizer tokenizer = new StringTokenizer(line);
    while (tokenizer.hasMoreTokens()) {
        word.set(tokenizer.nextToken());
        context.write(word, one);
        }
    }
}
于 2012-09-03T13:59:19.167 回答