2

我正在使用 TweetStream (https://github.com/joshmarshall/TweetStream),这是一个基于龙卷风的推特流媒体模块来监控流 API。

如果想更改跟踪的单词,我想知道如何重新启动提取过程。

我当前的解决方案(不完全是解决方案)给了​​我一些错误。

stream = tweetstream.TweetStream(configuration,ioloop=main_io_loop)

stream.fetch("/1.1/statuses/filter.json?track="+tornado.escape.url_escape(words), callback=callback)


def check_words():
    global words
    with open('words.txt') as file:
        newwords = file.read()
        if words != newwords:
            words = newwords
        try:
            print newwords
            stream.fetch("/1.1/statuses/filter.json?track="+tornado.escape.url_escape(words), callback=callback)
        except:
            pass
        file.close()

interval_ms = 1000*10
scheduler = tornado.ioloop.PeriodicCallback(check_words,interval_ms,io_loop = main_io_loop)
scheduler.start()
main_io_loop.start()

这是我得到的错误

ERROR:root:Uncaught exception, closing connection.
Traceback (most recent call last):
  File "/home/user/PycharmProjects/observrenv/local/lib/python2.7/site-packages/tornado/iostream.py", line 305, in wrapper
    callback(*args)
  File "/home/user/PycharmProjects/observrenv/src/tweetstream/tweetstream.py", line 155, in on_connect
    self._twitter_stream.read_until("\r\n\r\n", self.on_headers)
  File "/home/user/PycharmProjects/observrenv/local/lib/python2.7/site-packages/tornado/iostream.py", line 151, in read_until
    self._set_read_callback(callback)
  File "/home/user/PycharmProjects/observrenv/local/lib/python2.7/site-packages/tornado/iostream.py", line 369, in _set_read_callback
    assert not self._read_callback, "Already reading"
AssertionError: Already reading
ERROR:root:Exception in callback <tornado.stack_context._StackContextWrapper object at 0x2415cb0>
Traceback (most recent call last):
  File "/home/user/PycharmProjects/observrenv/local/lib/python2.7/site-packages/tornado/ioloop.py", line 421, in _run_callback
    callback()
  File "/home/user/PycharmProjects/observrenv/local/lib/python2.7/site-packages/tornado/iostream.py", line 305, in wrapper
    callback(*args)
  File "/home/user/PycharmProjects/observrenv/src/tweetstream/tweetstream.py", line 155, in on_connect
    self._twitter_stream.read_until("\r\n\r\n", self.on_headers)
  File "/home/user/PycharmProjects/observrenv/local/lib/python2.7/site-packages/tornado/iostream.py", line 151, in read_until
    self._set_read_callback(callback)
  File "/home/user/PycharmProjects/observrenv/local/lib/python2.7/site-packages/tornado/iostream.py", line 369, in _set_read_callback
    assert not self._read_callback, "Already reading"
AssertionError: Already reading

通过在调用 check_words 时再次启动 ioloop,我取得了更好的结果(不是最好的)。

stream = tweetstream.TweetStream(configuration,ioloop=main_io_loop)

stream.fetch("/1.1/statuses/filter.json?track="+tornado.escape.url_escape(words), callback=callback)


def check_words():
    global words, stream
    with open('words.txt') as file:
        newwords = file.read()
    if words != newwords:
        words = newwords
        print newwords
        try:
            stream = tweetstream.TweetStream(configuration,ioloop=main_io_loop)
            stream.fetch("/1.1/statuses/filter.json?track="+tornado.escape.url_escape(words), callback=callback)
            interval_ms = 1000*10
            scheduler = tornado.ioloop.PeriodicCallback(check_words,interval_ms,io_loop = main_io_loop)
            scheduler.start()
            main_io_loop.start()
        except:
            pass
        file.close()


interval_ms = 1000*10
scheduler = tornado.ioloop.PeriodicCallback(check_words,interval_ms,io_loop = main_io_loop)
scheduler.start()
main_io_loop.start()
4

2 回答 2

1

正如一位推特员工所说建议做我已经在做的事情(但以更温和的方式)。如果您的查询条件发生变化,只需偶尔重新连接一次。否则,只需保持连接打开。监视 twitter 可能发送给您的错误或您可能被禁止的错误也很重要。

于 2012-10-06T02:57:19.630 回答
0

看起来,您缺少 Streaming API 的主要思想。与它的连接永久打开。

stream = tweetstream.TweetStream(configuration,ioloop=main_io_loop)

#What you are doing in callback? 
stream.fetch("/1.1/statuses/filter.json?track="+tornado.escape.url_escape(words), callback=callback)


def check_words():
    #I guess, don't do it at all. 
    #global words
    #with open('words.txt') as file:
    #    newwords = file.read()
    #    if words != newwords:
    #        words = newwords
    #    try:
    #        #Don't open new stream here
    #        print newwords
    #    except:
    #        pass
    #    file.close()
    pass

interval_ms = 1000*10
scheduler = tornado.ioloop.PeriodicCallback(check_words,interval_ms,io_loop = main_io_loop)
scheduler.start()
main_io_loop.start()

通过分析你的代码,我认为你只需要在回调中使用新单词就可以了。

于 2012-10-05T09:55:46.017 回答