python - 从 Python 2.7 输出 Tweepy TwitterStreamer 到 .csv

Question

我有一个用 Python 编写的文件，它发送到 Twitterstream 并根据列表中的关键字获取消息。列表很长，输出不是我想要的。我想清理文件并将结果输出到文本文件。

这是我当前的代码，它将所有消息写入一行：

import sys
....

if __name__ == '__main__':
     with open("keywords.txt", "r") as f:
         keywords = f.readlines()


    l = StdOutListener()    
    auth = OAuthHandler(consumer_key, consumer_secret)    
    auth.set_access_token(access_token, access_token_secret)    

    stream = Stream(auth, l)        
    stream.filter(track=keywords])

上面没有引入任何内容，这意味着当我在命令提示符下键入以下内容时，没有任何内容输出到文本文件中。python hashtagworking.py > output.txt 其中大约有 300 项，stream.filter所以我想使用 txt 文件代替实际文本。此外，每条消息的结果都显示在一行上，这是它的编写方式，我想将其重写为消息中每个对象的输出到 csv 文件。

我认为这是我正在寻找的，但想确保：类似的问题

我还想从其他嵌套对象（例如实体：{...}）中获取内容，特别是我想从实体对象中获取主题标签，更具体地说是任何对象。我已经尝试过一切data.text.hashtag data.entities.hashtag data.entities.media.hashtag都无济于事。

score 1 · Accepted Answer

对于您的关键字问题，假设您已将它们全部放在一个 txt 文件中（每行一个标记）

with open("tokens.txt", "r") as f:
    tokens = f.readlines()

....
stream.filter(track=tokens)

对于您的其他问题（以 .csv 格式输出），您能否在文件中写下您想要的示例？

class StdOutListener(StreamListener):        
        """ A listener handles tweets are the received from the stream.
        This is a basic listener that just prints received tweets to stdout.

        """        
    def on_status(self, data):            
        try:                
            print '%s , %s , %s , %s' % (data.text,\ <-- change to data.csv?               
            data.author.screen_name,data.created_at,data.source)
            with open("data.csv", 'a+') as f:
                f.write("{text},{name},{created},{source}\n"
                        .format(text=str(data.text), 
                                name=str(data.author.screen_name), 
                                created=str(data.created_at), 
                                source=str(data.source)))               
            return True            
        except Exception, e:                
            print >> sys.stderr, 'Encountered Exception:', e                
            pass        

    def on_error(self, status):            
        return True

请注意，这不是一个可接受的长期解决方案，因为每次从流中过滤一条推文（又名泛洪 I/O）时，您都在打开和关闭文件，您可以做的是实现一个缓冲区（每次缓冲区已满，将其转储到文件中）。

请注意，我正在手动写入 csv 文件，如果您想更深入地了解 csv 操作，请查看http://docs.python.org/2/library/csv.html

python - 从 Python 2.7 输出 Tweepy TwitterStreamer 到 .csv

1 回答 1

Related

Reference