0

我是 Flume-Ng 的新手,需要帮助来跟踪文件。我有一个运行 hadoop 的集群,flume 远程运行。我使用腻子与该集群通信。我想在我的 PC 上跟踪一个文件并将其放在集群中的 HDFS 上。我正在使用以下代码。

#flume.conf: http source, hdfs sink
# Name the components on this agent 

tier1.sources = r1
tier1.sinks = k1
tier1.channels = c1


# Describe/configure the source
tier1.sources.r1.type = exec
tier1.sources.r1.command = tail -F /(Path to file on my PC)


# Describe the sink
tier1.sinks.k1.type = hdfs
tier1.sinks.k1.hdfs.path = /user/ntimbadi/flume/
tier1.sinks.k1.hdfs.filePrefix = events-
tier1.sinks.k1.hdfs.round = true
tier1.sinks.k1.hdfs.roundValue = 10
tier1.sinks.k1.hdfs.roundUnit = minute



 # Use a channel which buffers events in memory
 tier1.channels.c1.type = memory
 tier1.channels.c1.capacity = 1000
 tier1.channels.c1.transactionCapacity = 100


 # Bind the source and sink to the channel
 tier1.sources.r1.channels = c1
 tier1.sinks.k1.channel = c1

我相信错误在源头上。这种来源不使用主机名或 ip 来查找(在这种情况下应该是我的 PC)。有人可以给我一个提示,告诉我如何在我的 PC 上跟踪文件以使用 Flume 将其上传到远程的 HDFS。

4

1 回答 1

0

您配置中的exec源代码将在您启动水槽tier1代理的机器上运行。如果你想从另一台机器上收集数据,你也需要在那台机器上启动一个flume agent;总结一下你需要:

  • remote1一个代理avro
  • 在您的local1机器上运行的代理 ( ) (充当收集器)具有源并通过接收器将数据发送到远程代理。execavro

或者,您可以只在本地计算机上运行一个水槽代理(具有您发布的相同配置)并将 hdfs 路径设置为“hdfs://REMOTE_IP/hdfs/path”(尽管我不完全确定这会工作)。

编辑:以下是 2 代理方案的示例配置(如果不进行一些修改,它们可能无法工作)。

remote1.channels.mem-ch-1.type = memory

remote1.sources.avro-src-1.channels = mem-ch-1
remote1.sources.avro-src-1.type = avro
remote1.sources.avro-src-1.port = 10060
remote1.sources.avro-src-1.bind = 10.88.66.4 /* REPLACE WITH YOUR MACHINE'S EXTERNAL IP */

remote1.sinks.k1.channel = mem-ch-1
remote1.sinks.k1.type = hdfs
remote1.sinks.k1.hdfs.path = /user/ntimbadi/flume/
remote1.sinks.k1.hdfs.filePrefix = events-
remote1.sinks.k1.hdfs.round = true
remote1.sinks.k1.hdfs.roundValue = 10
remote1.sinks.k1.hdfs.roundUnit = minute

remote1.sources = avro-src-1
remote1.sinks = k1
remote1.channels = mem-ch-1

local1.channels.mem-ch-1.type = memory

local1.sources.exc-src-1.channels = mem-ch-1
local1.sources.exc-src-1.type = exec
local1.sources.exc-src-1.command = tail -F /(Path to file on my PC)

local1.sinks.avro-snk-1.channel = mem-ch-1
local1.sinks.avro-snk-1.type = avro
local1.sinks.avro-snk-1.hostname = 10.88.66.4 /* REPLACE WITH REMOTE IP */
local1.sinks.avro-snk-1.port = 10060

local1.sources = exc-src-1
local1.sinks = avro-snk-1
local1.channels = mem-ch-1
于 2013-06-07T06:52:37.167 回答