我正在使用 Cloudera Manager CDH 5.4.2,还安装了 Flume,我无法保存从 Twitter 获得的信息
当我运行水槽代理时,它开始正常,但在尝试将新事件数据写入 hdfs 时出错。我收到以下错误:
INFO org.apache.flume.sink.hdfs.BucketWriter: Creating hdfs://192.168.109.6:8020/user/flume/tweets/2015/06/03/06//FlumeData.1433311217583.tmp
WARN org.apache.flume.sink.hdfs.HDFSEventSink: HDFS IO error
java.net.ConnectException: Call From cluster-05.xxxx.com/192.168.109.6 to cluster-05.xxxx.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
我所做的配置是:
水槽-conf.property:
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://192.168.109.6:8020/user/flume/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
我使用以下插件:
flume-sources-1.0-SNAPSHOT.jar
twitter4j-core-2.2.6.jar
twitter4j-media-support-2.2.6.jar
twitter4j-stream-2.2.6.jar
(我将 twitter4j- -3.0.3.jar 的版本替换为 twitter4j- -2.2.6.jar)
也是使用 hdfs 用户的目录
hadoop fs -ls /user/flume :
drwxrwxrwx - flume flume /user/flume/tweets
core-site.xml ( 在 /hadoop/conf ) 我添加了:
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8020</value>
</property>
我还在以 HDFS 用户身份离开 Flume Agent 的主机上运行 hadoop dfsadmin -safemode leave
在这个问题上,我非常感谢您的帮助。