twitter - 来自 Twitter 的 Flume HDFS sink 问题

Question

我目前在 Flume 中有这个配置：

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
# The configuration file needs to define the sources,
# the channels and the sinks.
# Sources, channels and sinks are defined per agent,
# in this case called 'TwitterAgent'
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = YPTxqtRamIZ1bnJXYwGW
TwitterAgent.sources.Twitter.consumerSecret = Wjyw9714OBzao7dktH0csuTByk4iLG9Zu4ddtI6s0ho
TwitterAgent.sources.Twitter.accessToken = 2340010790-KhWiNLt63GuZ6QZNYuPMJtaMVjLFpiMP4A2v
TwitterAgent.sources.Twitter.accessTokenSecret = x1pVVuyxfvaTbPoKvXqh2r5xUA6tf9einoByLIL8rar
TwitterAgent.sources.Twitter.keywords = hadoop, big data, analytics, bigdata, cloudera, data science, data scientiest, business intelligence, mapreduce, data warehouse, data warehousing, mahout, hbase, nosql, newsql, businessintelligence, cloudcomputing
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = hdfs://hadoop1:8020/user/flume/tweets/%Y/%m/%d/%H/
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text
TwitterAgent.sinks.HDFS.hdfs.batchSize = 1000
TwitterAgent.sinks.HDFS.hdfs.rollSize = 0
TwitterAgent.sinks.HDFS.hdfs.rollCount = 10000
TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 10000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

twitter 应用程序授权密钥是正确的。而且我在水槽日志文件中不断收到此错误：

ERROR   org.apache.flume.SinkRunner     

Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: java.lang.IllegalArgumentException: java.net.UnknownHostException: hadoop1
    at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:446)
    at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
    at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
    at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: hadoop1
    at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:414)
    at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:164)
    at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:448)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:410)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:128)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2310)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2344)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2326)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:353)
    at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)
    at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:227)
    at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:221)
    at org.apache.flume.sink.hdfs.BucketWriter$8$1.run(BucketWriter.java:589)
    at org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:161)
    at org.apache.flume.sink.hdfs.BucketWriter.access$800(BucketWriter.java:57)
    at org.apache.flume.sink.hdfs.BucketWriter$8.call(BucketWriter.java:586)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    ... 1 more
Caused by: java.net.UnknownHostException: hadoop1
    ... 23 more

这里有没有人知道为什么并可以向我解释？提前致谢。

score 0 · Accepted Answer

根据Exception，问题是主机hadoop1是未知的。

根据水槽配置文件，您给出的路径是

hdfs://hadoop1:8020/user/flume/tweets/%Y/%m/%d/%H/

它应该可以从带有水槽代理的机器上访问。由于机器名不能用于访问 HDFS 除非在同一个域中，您需要使用设置的 IP 地址访问 HDFScore-site.xml

twitter - 来自 Twitter 的 Flume HDFS sink 问题

1 回答 1

Related

Reference