我正在尝试将 HDFS 配置为水槽作为接收器。
这是我的 flume.conf 文件:
agent1.channels.ch1.type = memory
agent1.sources.avro-source1.channels = ch1
agent1.sources.avro-source1.type = avro
agent1.sources.avro-source1.bind = 0.0.0.0
agent1.sources.avro-source1.port = 41414
agent1.sinks.log-sink1.type = logger
agent1.sinks.hdfs-sink.channel=ch1
agent1.sinks.hdfs-sink.type=hdfs
agent1.sinks.hdfs-sink.hdfs.path=hdfs://localhost:9000/flume/flumehdfs/
agent1.sinks.hdfs-sink.hdfs.fileType = DataStream
agent1.sinks.hdfs-sink.hdfs.writeFormat = Text
agent1.sinks.hdfs-sink.hdfs.batchSize = 1000
agent1.sinks.hdfs-sink.hdfs.rollSize = 0
agent1.sinks.hdfs-sink.hdfs.rollCount = 10000
agent1.sinks.hdfs-sink.hdfs.rollInterval = 600
agent1.channels = ch1
agent1.sources = avro-source1
agent1.sinks = log-sink1 hdfs-sink
我的hadoop版本是:
Hadoop 0.20.2
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707
水槽版本是:
apache-flume-1.4.0
我已将这两个 jar 文件放在 flume/lib 目录中
hadoop-0.20.2-core
hadoop-common-0.22.0
我将 hadoop-common jar 放在那里,因为在启动水槽代理时出现以下错误:
Unhandled error
java.lang.NoSuchMethodError: org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled()Z
at org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:491)
at org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:240)
at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
at org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:418)
at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
现在代理正在启动。这是启动日志:
记录器=调试,控制台 信息:包括通过 (/home/user/Downloads/hadoop-0.20.2/bin/hadoop) 找到的用于 HDFS 访问的 Hadoop 库 线程“主”java.lang.NoClassDefFoundError 中的异常:类路径 引起:java.lang.ClassNotFoundException:类路径 在 java.net.URLClassLoader$1.run(URLClassLoader.java:217) 在 java.security.AccessController.doPrivileged(本机方法) 在 java.net.URLClassLoader.findClass(URLClassLoader.java:205) 在 java.lang.ClassLoader.loadClass(ClassLoader.java:321) 在 sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) 在 java.lang.ClassLoader.loadClass(ClassLoader.java:266) 找不到主类:类路径。程序将会退出。 + exec /usr/lib/jvm/default-java/bin/java -Xmx20m -Dflume.root.logger=DEBUG,console -cp '/home/user/Downloads/apache-flume-1.4.0-bin/conf: /home/user/Downloads/apache-flume-1.4.0-bin/lib/*' -Djava.library.path=:/home/user/Downloads/hadoop-0.20.2/bin/../lib/native /Linux-amd64-64 org.apache.flume.node.Application -n agent1 -f ./conf/flume.conf 2013-09-04 07:55:22,634 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:61)] 配置提供程序启动 2013-09-04 07:55:22,639 (lifecycleSupervisor-1-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:78)] 配置提供程序已启动 2013-09-04 07:55:22,640 (conf-file-poller-0) [DEBUG - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:126)] 检查文件:./conf/ flume.conf 进行更改 2013-09-04 07:55:22,642 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:133)] 重新加载配置文件:./conf /flume.conf 2013-09-04 07:55:22,648 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] 处理:hdfs-sink 2013-09-04 07:55:22,648 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1020)] 为 hdfs-sink 创建上下文: hdfs.fileType 2013-09-04 07:55:22,649 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] 处理:loggerSink 2013-09-04 07:55:22,650 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1020)] 为 loggerSink 创建上下文:类型 2013-09-04 07:55:22,650 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] 处理:hdfs-sink 2013-09-04 07:55:22,650 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] 处理:hdfs-sink 2013-09-04 07:55:22,650 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] 处理:hdfs-sink 2013-09-04 07:55:22,650 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] 处理:hdfs-sink 2013-09-04 07:55:22,651 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] 处理:hdfs-sink 2013-09-04 07:55:22,651 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] 处理:log-sink1 2013-09-04 07:55:22,651 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1020)] 为 log-sink1 创建上下文:类型 2013-09-04 07:55:22,651 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:930)] 添加了接收器:loggerSink 代理:代理 2013-09-04 07:55:22,654 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:930)] 添加了接收器:log-sink1 hdfs -sink 代理:agent1 2013-09-04 07:55:22,654 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] 处理:hdfs-sink 2013-09-04 07:55:22,654 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] 处理:hdfs-sink 2013-09-04 07:55:22,654 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] 处理:loggerSink 2013-09-04 07:55:22,654 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] 处理:hdfs-sink 2013-09-04 07:55:22,655 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] 处理:log-sink1 2013-09-04 07:55:22,655 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:313)] 开始验证代理的配置:代理,初始配置:AgentConfiguration[代理] 来源:{seqGenSrc={ 参数:{channels=memoryChannel, type=seq} }} 通道:{memoryChannel={参数:{容量=100,类型=内存}}} SINKS:{loggerSink={ 参数:{type=logger, channel=memoryChannel} }} 2013-09-04 07:55:22,661 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateChannels(FlumeConfiguration.java:468)] 创建通道 memoryChannel 2013-09-04 07:55:22,671 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSinks(FlumeConfiguration.java:674)] 创建接收器:使用 LOGGER 的 loggerSink 2013-09-04 07:55:22,673 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:371)] 代理的后验证配置 AgentConfiguration 创建时没有配置存根,仅对其执行基本语法验证[代理] 来源:{seqGenSrc={ 参数:{channels=memoryChannel, type=seq} }} 通道:{memoryChannel={参数:{容量=100,类型=内存}}} 使用已执行完整验证的配置存根创建的 AgentConfiguration[代理] SINKS:{loggerSink=ComponentConfiguration[loggerSink] 配置: 频道:记忆频道 } 2013-09-04 07:55:22,673 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:135)] 通道:memoryChannel 2013-09-04 07:55:22,673 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:136)] 接收器 loggerSink 2013-09-04 07:55:22,674 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:137)] 来源 seqGenSrc 2013-09-04 07:55:22,674 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:313)] 开始验证代理的配置:代理 1,初始配置:代理配置 [代理 1] 来源:{avro-source1={ 参数:{port=41414,channels=ch1,type=avro,bind=0.0.0.0} }} 通道:{ch1={ 参数:{type=memory} }} SINKS:{hdfs-sink={ 参数:{hdfs.fileType=DataStream, hdfs.path=hdfs://localhost:9000/flume/flumehdfs/, hdfs.batchSize=1000, hdfs.rollInterval=600, hdfs.rollSize= 0, hdfs.writeFormat=Text, type=hdfs, hdfs.rollCount=10000, channel=ch1} }, log-sink1={ 参数:{type=logger, channel=ch1} }} 2013-09-04 07:55:22,675 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateChannels(FlumeConfiguration.java:468)] 创建通道 ch1 2013-09-04 07:55:22,677 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSinks(FlumeConfiguration.java:674)] 创建接收器:hdfs-sink 使用高密度文件系统 2013-09-04 07:55:22,678 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateSinks(FlumeConfiguration.java:674)] 创建接收器:使用 log-sink1记录器 2013-09-04 07:55:22,679 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.isValid(FlumeConfiguration.java:371)] agent1 的验证后配置 AgentConfiguration 创建时没有配置存根,仅对其执行基本语法验证[agent1] 来源:{avro-source1={ 参数:{port=41414,channels=ch1,type=avro,bind=0.0.0.0} }} 通道:{ch1={ 参数:{type=memory} }} SINKS:{hdfs-sink={ 参数:{hdfs.fileType=DataStream, hdfs.path=hdfs://localhost:9000/flume/flumehdfs/, hdfs.batchSize=1000, hdfs.rollInterval=600, hdfs.rollSize= 0, hdfs.writeFormat=文本, type=hdfs, hdfs.rollCount=10000, channel=ch1} }} 使用已执行完整验证的配置存根创建的 AgentConfiguration [agent1] 接收器:{log-sink1=ComponentConfiguration[log-sink1] 配置: 频道:ch1 } 2013-09-04 07:55:22,679 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:135)] 频道:ch1 2013-09-04 07:55:22,679 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:136)] 接收 hdfs-sink log-sink1 2013-09-04 07:55:22,679 (conf-file-poller-0) [DEBUG - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:137)] 来源 avro-source1 2013-09-04 07:55:22,680 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:140)] 验证后水槽配置包含代理的配置:[代理,代理1] 2013-09-04 07:55:22,680 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:150)] 创建通道 2013-09-04 07:55:22,691 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:40)] 创建通道 ch1 类型内存的实例 2013-09-04 07:55:22,699 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:205)] 创建通道 ch1 2013-09-04 07:55:22,700 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:39)] 创建源 avro-source1 的实例,输入avro 2013-09-04 07:55:22,733 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:40)] 创建接收器实例:log-sink1,类型:记录器 2013-09-04 07:55:22,736 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:40)] 创建接收器实例:hdfs-sink,类型:hdfs 2013-09-04 07:55:22,985 (conf-file-poller-0) [INFO - org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:493)] Hadoop 安全启用:false 2013-09-04 07:55:22,989 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:119)] 通道 ch1 连接到 [avro-source1,日志接收器1,hdfs接收器] 2013-09-04 07:55:22,996 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:138)] 开始新配置:{ sourceRunners:{avro -source1=EventDrivenSourceRunner: { source:Avro source avro-source1: { bindAddress: 0.0.0.0, port: 41414 } }} sinkRunners:{hdfs-sink=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@709446e4 counterGroup:{ name:null counters:{} } }, log-sink1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@16ba5c7a counterGroup:{ name:null counters:{} } }} channels:{ch1 =org.apache.flume.channel.MemoryChannel{name: ch1}} } 2013-09-04 07:55:23,011 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:145)] 启动通道 ch1 2013-09-04 07:55:23,064 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:110)] 监控的计数器组类型:CHANNEL,名称:ch1 , 注册成功。 2013-09-04 07:55:23,064 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:94)] 组件类型:CHANNEL,名称:ch1 已启动 2013-09-04 07:55:23,065 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] 启动 Sink hdfs-sink 2013-09-04 07:55:23,066 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:173)] 启动 Sink log-sink1 2013-09-04 07:55:23,068 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] 起始源 avro-source1 2013-09-04 07:55:23,069 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.source.AvroSource.start(AvroSource.java:192)] 启动 Avro 源 avro-source1: { bindAddress: 0.0 .0.0,端口:41414 }... 2013-09-04 07:55:23,069 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:110)] 监控的计数器组类型:SINK,名称:hdfs -sink,注册成功。 2013-09-04 07:55:23,069 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:94)] 组件类型:SINK,名称:hdfs-sink 启动 2013-09-04 07:55:23,078 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:143)] 轮询 sink runner 开始 2013-09-04 07:55:23,079 (SinkRunner-PollingRunner-DefaultSinkProcessor) [DEBUG - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:143)] 轮询 sink runner 开始 2013-09-04 07:55:23,458 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:110)] 监控的计数器组类型:SOURCE,名称:avro -source1,注册成功。 2013-09-04 07:55:23,462 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:94)] 组件类型:SOURCE,名称:avro-source1 已启动 2013-09-04 07:55:23,464 (lifecycleSupervisor-1-3) [INFO - org.apache.flume.source.AvroSource.start(AvroSource.java:217)] Avro 源 avro-source1 已启动。
但是,当某个事件发生时,flume 日志中会出现以下错误,并且也没有任何内容写入 hdfs。
错误 - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:422)] 进程失败 java.lang.NoSuchMethodError: org.apache.hadoop.util.Shell.getGROUPS_COMMAND()[Ljava/lang/String; 在 org.apache.hadoop.security.UnixUserGroupInformation.getUnixGroups(UnixUserGroupInformation.java:345) 在 org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:264) 在 org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:300) 在 org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:192) 在 org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:170) 在 org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:82) 在 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1792) 在 org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:76) 在 org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1826) 在 org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1808) 在 org.apache.hadoop.fs.FileSystem.get(FileSystem.java:265) 在 org.apache.hadoop.fs.Path.getFileSystem(Path.java:190) 在 org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:226) 在 org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:220) 在 org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:536) 在 org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:160) 在 org.apache.flume.sink.hdfs.BucketWriter.access$1000(BucketWriter.java:56) 在 org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:533) 在 java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) 在 java.util.concurrent.FutureTask.run(FutureTask.java:166) 在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) 在 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 在 java.lang.Thread.run(Thread.java:679)
我缺少一些配置或 jar 文件?