We are saving tweets in a directory order like /user/flume/2016/06/28/13/FlumeData... .But each hour it creates more than 100 FlumeData file.I have changed TwitterAgent.sinks.HDFS.hdfs.rollSize = 52428800 (50 mb)
same thing happened again.After that I tried with changing rollcount parametre too but didnt work.How can i set parametres to get one FlumeData file per hour.
问问题
112 次
3 回答
0
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hpc01:8020/user/flume/tweets/%Y/%m/%d/%H
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 0
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 1000
于 2016-07-11T07:30:03.287 回答
0
我通过设置 rollInterval=3600 rollcount=0 和 batchSize=100 flume.conf 参数解决了这个问题,正如@vkgade 建议的那样
于 2016-07-12T07:36:59.517 回答
0
怎么样rollInterval
?你把它设置为零。如果是,那么问题可能是其他问题。如果rollInterval
设置为某个值,它会覆盖rollSize
和rollCount
值。文件轮换可能发生在文件大小达到该rollSize
值之前。另外,检查您设置的 HDFS 块大小。如果它被设置为,即使是太小的值也可能导致文件滚动。
尝试这个 -
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hpc01:8020/user/flume/tweets/%Y/%m/%d/%H
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 100
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 0
TwitterAgent.sinks.HDFS.hdfs.rollInterval = 3600
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 1000
TwitterAgent.channels.MemChannel.transactionCapacity = 100
于 2016-07-07T11:57:51.780 回答