0

我试图配置 Flume,所以它至少使用接近 HDFS 的块大小,在我的例子中是 128mb。这是我的配置,每个文件写入大约 10mb:

###############################
httpagent.sources = http-source
httpagent.sinks = k1
httpagent.channels = ch3

# Define / Configure Source (multiport seems to support newer "stuff")
###############################
httpagent.sources.http-source.type = org.apache.flume.source.http.HTTPSource
httpagent.sources.http-source.channels = ch3
httpagent.sources.http-source.port = 5140

httpagent.sinks = k1
httpagent.sinks.k1.type = hdfs
httpagent.sinks.k1.channel = ch3
httpagent.sinks.k1.hdfs.path = hdfs://r3608/hadoop/hdfs/data/flumechannel3/0.5/
httpagent.sinks.k1.hdfs.fileType = DataStream
httpagent.sinks.HDFS.hdfs.writeFormat = Text
httpagent.sinks.k1.hdfs.rollCount = 0
httpagent.sinks.k1.hdfs.batchSize = 10000
httpagent.sinks.k1.hdfs.rollSize = 0



httpagent.sinks.log-sink.channel = memory
httpagent.sinks.log-sink.type = logger





# Channels
###############################

httpagent.channels = ch3
httpagent.channels.ch3.type = memory
httpagent.channels.ch3.capacity = 100000
httpagent.channels.ch3.transactionCapacity = 80000

所以问题是我不能让它写大约 100mb 的文件。如果我像这样更改配置,我希望至少写大约 100mb:

httpagent.sinks = k1
httpagent.sinks.k1.type = hdfs
httpagent.sinks.k1.channel = ch3
httpagent.sinks.k1.hdfs.path = hdfs://r3608/hadoop/hdfs/data/flumechannel3/0.4test/
httpagent.sinks.k1.hdfs.fileType = DataStream
httpagent.sinks.HDFS.hdfs.writeFormat = Text
httpagent.sinks.k1.hdfs.rollSize = 100000000                                   
httpagent.sinks.k1.hdfs.rollCount = 0

但是随后文件变得更小,并且他正在编写大约 3-8mb 的文件......因为它实际上不可能聚合​​它们在 hdfs 中的文件,所以我真的想让这些文件更大。关于 rollSize 参数有什么我没有得到的吗?还是有一些默认值,所以地狱永远不会写那个大文件?

4

1 回答 1

3

您需要将 rollInterval 覆盖为 0,切勿根据时间间隔滚动:

httpagent.sinks.k1.hdfs.rollInterval = 0
于 2014-07-24T09:54:44.267 回答