1

我正在使用带有 cdh3u5 的 flume-ng-1.2.0。我只是想从文本文件中提取数据并将其放入 hdfs。这是我正在使用的配置:

agent1.sources = tail1
agent1.channels = Channel-2
agent1.sinks = HDFS

agent1.sources.tail1.type = exec
agent1.sources.tail1.command = tail -F /usr/games/sample1.txt
agent1.sources.tail1.channels = Channel-2

agent1.sinks.HDFS.channel = Channel-2
agent1.sinks.HDFS.type = hdfs
agent1.sinks.HDFS.hdfs.path = hdfs://10.12.1.2:8020/user/hdfs/flume
agent1.sinks.HDFS.hdfs.fileType = DataStream

agent1.channels.Channel-2.type = memory
agent1.channels.Channel-2.capacity = 1000

我正在运行代理bin/flume-ng agent -n agent1 -c ./conf/ -f conf/flume.conf

我得到的日志是

2012-10-11 12:10:36,626 INFO lifecycle.LifecycleSupervisor: Starting lifecycle supervisor 1
2012-10-11 12:10:36,631 INFO node.FlumeNode: Flume node starting - agent1
2012-10-11 12:10:36,639 INFO nodemanager.DefaultLogicalNodeManager: Node manager starting
2012-10-11 12:10:36,639 INFO lifecycle.LifecycleSupervisor: Starting lifecycle supervisor 12
2012-10-11 12:10:36,641 INFO properties.PropertiesFileConfigurationProvider: Configuration provider starting
2012-10-11 12:10:36,646 INFO properties.PropertiesFileConfigurationProvider: Reloading configuration file:conf/flume.conf
2012-10-11 12:10:36,657 INFO conf.FlumeConfiguration: Processing:HDFS
2012-10-11 12:10:36,670 INFO conf.FlumeConfiguration: Processing:HDFS
2012-10-11 12:10:36,670 INFO conf.FlumeConfiguration: Processing:HDFS
2012-10-11 12:10:36,670 INFO conf.FlumeConfiguration: Processing:HDFS
2012-10-11 12:10:36,671 INFO conf.FlumeConfiguration: Added sinks: HDFS Agent: agent1
2012-10-11 12:10:36,758 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration  for agents: [agent1]
2012-10-11 12:10:36,758 INFO properties.PropertiesFileConfigurationProvider: Creating channels
2012-10-11 12:10:36,800 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: CHANNEL, name: Channel-2, registered successfully.
2012-10-11 12:10:36,800 INFO properties.PropertiesFileConfigurationProvider: created channel Channel-2
2012-10-11 12:10:36,835 INFO sink.DefaultSinkFactory: Creating instance of sink: HDFS, type: hdfs
2012-10-11 12:10:37,753 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
2012-10-11 12:10:37,896 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SINK, name: HDFS, registered successfully.
2012-10-11 12:10:37,899 INFO nodemanager.DefaultLogicalNodeManager: Starting new configuration:{ sourceRunners:{tail1=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource@362f0d54 }} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@4b142196 counterGroup:{ name:null counters:{} } }} channels:{Channel-2=org.apache.flume.channel.MemoryChannel@16a9255c} }
2012-10-11 12:10:37,900 INFO nodemanager.DefaultLogicalNodeManager: Starting Channel Channel-2
2012-10-11 12:10:37,901 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: Channel-2 started
2012-10-11 12:10:37,901 INFO nodemanager.DefaultLogicalNodeManager: Starting Sink HDFS
2012-10-11 12:10:37,905 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: HDFS started
2012-10-11 12:10:37,910 INFO nodemanager.DefaultLogicalNodeManager: Starting Source tail1
2012-10-11 12:10:37,912 INFO source.ExecSource: Exec source starting with command:tail -F /usr/games/sample1.txt

我不知道我在哪里做错了。因为我是初学者,所以我在 hdfs 中没有得到任何东西,flume-agent 一直在运行。任何建议和更正都会对我很有帮助,谢谢。

4

1 回答 1

2

一个问题是您已经设置agent1.sinks.HDFS.hdfs.file.Type = DataStream但属性是hdfs.fileType- 请参阅https://flume.apache.org/FlumeUserGuide.html#hdfs-sink了解更多信息。

我会尝试使用记录器接收器sink.type = logger-- 只是为了看看是否有任何东西通过。还要确保在tail -F从 shell 运行该命令时得到一些东西。

还有一件事,这可能是一个红鲱鱼:在您的日志消息末尾有一个反引号 (`)。也许这是一个粘贴错误,但如果不是,那么如果它在您的配置文件中,如果它引起了麻烦,我不会感到惊讶。我所指的消息来自日志的最后一行:

Exec source starting with command:tail -F /usr/games/value.txt`
于 2012-10-11T05:00:18.957 回答