0

我是新来的水槽。但我想将天气数据从任何网站流式传输到我的 hdfs 位置。所以我创建了接收器、源和通道...如下

weather.channels= memory-channel
weather.channels.memory-channel.capacity=10000
weather.channels.memory-channel.type = memory
weather.sinks = hdfs-write
weather.sinks.hdfs-write.channel=memory-channel
 weather.sinks.hdfs-write.type = logger
 weather.sinks.hdfs-write.hdfs.path = hdfs://localhost:8020/user/hadoop/flume
weather.sinks.hdfs-write.rollInterval = 1200
weather.sinks.hdfs-write.hdfs.writeFormat=Text
weather.sinks.hdfs-write.hdfs.fileType=DataStream
weather.sources= Weather
weather.sources.Weather.bind =  api.openweathermap.org/data/2.5/forecast/city?id=524901&APPID=********************************
weather.sources.Weather.channels=memory-channel
weather.sources.Weather.type = netcat
weather.sources.Weather.port = 80

所以我在这里使用 API 来处理这个问题。我还能用什么来输入天气数据,我可以使用什么在线网站,或者我应该使用哪个 API 来配置源?在执行 flume-ng 命令启动代理时,我正在关注

15/03/18 11:13:28 ERROR lifecycle.LifecycleSupervisor: Unable to start EventDrivenSourceRunner:{
 source:org.apache.flume.source.http.HTTPSource{name:Weather,state:IDLE} } - Exception follows.
java.lang.IllegalStateException: Running HTTP Server found in 
source:Weather before I started one.Will not attempt to start.
at com.google.common.base.Preconditions.checkState(Preconditions.java:145)at org.apache.flume.source.http.HTTPSource.start(HTTPSource.java:189)
at org.apache.flume.source.EventDrivenSourceRunner.start(EventDrivenSourceRunner.java:44)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) 
C15/03/18 11:13:31 INFO lifecycle.LifecycleSupervisor: Stopping lifecycle supervisor 10
15/03/18 11:13:31 INFO node.PollingPropertiesFileConfigurationProvider: Configuration provider stopping
15/03/18 11:13:31 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: memory-channel stopped
4

1 回答 1

0

您看到的“lyfecycle”错误是先前尝试启动 http 服务器的错误的原因。

最初的错误可能是由于尝试使用非 root 用户绑定到特权 80 端口。将端口更改为 >1024,例如 8080

但是,它在您尝试使用时不起作用。http 或 netcat 源监听调用,不会获取您在绑定中设置的 url。

我看到两个选项:

  1. 创建一个 linux 守护程序以定期执行 wget 或 curl 到该 url,将结果保存到文件中,然后使用 spool 源配置水槽。
  2. 创建您自己的 Flume 源,定期汇集该 url
于 2015-03-26T09:47:29.970 回答