regex - 如何使用 nltk - 正则表达式从 twitter 中获取流数据连接 pycurl

Question

我是 Python 的新手，我的老板给了我一个任务来做到这一点：

从 twitter 获取流数据连接到 pycurl 并以 JSON 格式输出
使用 NLTK 和正则表达式进行解析
将其保存到数据库文件（mySQL）或文件库（txt）

注意：这是我要抓取的网址（'http://search.twitter.com/search.json?geocode=-0.789275%2C113.921327%2C1.0km&q=+near%3Aindonesia+within%3A1km&result_type=recent&rpp =10')

有谁知道如何使用上述步骤从 twitter 获取流数据？

您的帮助将不胜感激:)

score 2 · Accepted Answer

我会看一下模式：它是一个非常好的网络挖掘库，它还带有一个 Twitter 挖掘 api。文档也很不错。

否则，请查看https://dev.twitter.com/docs/twitter-libraries获取 Twitter 库，获取流也应该非常简单。

regex - 如何使用 nltk - 正则表达式从 twitter 中获取流数据连接 pycurl

1 回答 1

Related

Reference