1

我正在尝试使用 Flume-ng(1.2) 将数据从平面文件(日志文件)加载到 Hbase 中。平面文件有多个列,每个列都是以冒号(:) 分隔的,它们都需要加载到 HBASE 中的单独列中。我在查看论坛时发现有一个来自 Apache 的 jar 来解决这个问题(org.apache.flume.sink.hbase.RegexHbaseEventSerializer),但我找不到任何配置文件或互联网上的用法。如果有人可以帮助我配置配置文件,那将很有帮助

平面文件中的内容 1:nn 2:pp 3:mm

谢谢

4

1 回答 1

1

RegexHbaseEventSerializer具有三个可以设置的配置参数(如源代码中所述);这些是:

/** Regular expression used to parse groups from event data. */
public static final String REGEX_CONFIG = "regex";

/** Whether to ignore case when performing regex matches. */
public static final String IGNORE_CASE_CONFIG = "regexIgnoreCase";

/** Comma separated list of column names to place matching groups in. */
public static final String COL_NAME_CONFIG = "colNames";

使用示例配置RegexHbaseEventSerializer如下(部分引用Cloudera 的 Flume 和 HBase 演示文稿):

host1.sources = src1
host1.sinks = sink1
host1.channels = ch1

host1.sources.src1.type = seq
host1.sources.src1.port = 25001
host1.sources.src1.bind = localhost
host1.sources.src1.channels = ch1

host1.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSink
host1.sinks.sink1.channel = ch1
host1.sinks.sink1.table = test3
host1.sinks.sink1.columnFamily = testing

host1.sinks.sink1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
host1.sinks.sink1.serializer.regex = X
host1.sinks.sink1.serializer.regexIgnoreCase = true
host1.sinks.sink1.serializer.colNames = column_1,column_2,column_3

host1.channels.ch1.type=memory10 
于 2012-09-09T10:25:59.970 回答