1

We are saving tweets in a directory order like /user/flume/2016/06/28/13/FlumeData... .But each hour it creates more than 100 FlumeData file.I have changed TwitterAgent.sinks.HDFS.hdfs.rollSize = 52428800 (50 mb) same thing happened again.After that I tried with changing rollcount parametre too but didnt work.How can i set parametres to get one FlumeData file per hour.

4

3 回答 3

0
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hpc01:8020/user/flume/tweets/%Y/%m/%d/%H
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

TwitterAgent.sinks.HDFS.hdfs.batchSize = 1


TwitterAgent.sinks.HDFS.hdfs.rollSize = 0

TwitterAgent.sinks.HDFS.hdfs.rollCount = 10

TwitterAgent.sinks.HDFS.hdfs.rollInterval = 0
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000

TwitterAgent.channels.MemChannel.transactionCapacity = 1000
于 2016-07-11T07:30:03.287 回答
0

我通过设置 rollInterval=3600 rollcount=0 和 batchSize=100 flume.conf 参数解决了这个问题,正如@vkgade 建议的那样

于 2016-07-12T07:36:59.517 回答
0

怎么样rollInterval?你把它设置为零。如果是,那么问题可能是其他问题。如果rollInterval设置为某个值,它会覆盖rollSizerollCount值。文件轮换可能发生在文件大小达到该rollSize值之前。另外,检查您设置的 HDFS 块大小。如果它被设置为,即使是太小的值也可能导致文件滚动。

尝试这个 -

    TwitterAgent.sinks.HDFS.channel = MemChannel
    TwitterAgent.sinks.HDFS.type = hdfs
    TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hpc01:8020/user/flume/tweets/%Y/%m/%d/%H
    TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
    TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

    TwitterAgent.sinks.HDFS.hdfs.batchSize = 100


    TwitterAgent.sinks.HDFS.hdfs.rollSize = 0

    TwitterAgent.sinks.HDFS.hdfs.rollCount = 0

    TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600
    TwitterAgent.channels.MemChannel.type = memory
    TwitterAgent.channels.MemChannel.capacity = 1000

    TwitterAgent.channels.MemChannel.transactionCapacity = 100
于 2016-07-07T11:57:51.780 回答