我正在尝试将一些数据放入数据流中,但数据不在云存储中 - 它是一个 rss 提要,我通常每 x 小时检查一次。有没有办法直接使用 SDK 来做到这一点,或者我必须先以其他方式将文件放到云存储中。
提前致谢。
我正在尝试将一些数据放入数据流中,但数据不在云存储中 - 它是一个 rss 提要,我通常每 x 小时检查一次。有没有办法直接使用 SDK 来做到这一点,或者我必须先以其他方式将文件放到云存储中。
提前致谢。
Dataflow doesn't provide a source for an RSS feed.
You could issue HTTP requests from a ParDo to fetch the data though. For example suppose the feed allowed you to fetch messages in some time range. Then you could create an input collection where each record represented a range of time (e.g. an hour). You could then write a ParDo which would fetch the messages in that time range and emit them.
If you are part of the streaming early access preview then one solution would be to write an App Engine App (or equivalent) which checked the RSS feed every X hours and then published the data using Google Cloud PubSub. You could then use PubSubIO to read those events in Dataflow.