2

我对搜索 API 有点困惑。假设我查询“foobar”,代码如下:

from twython import Twython
api = Twython(...)
r = api.search(q="foobar")

通过这种方式,我有 15 个状态和一个"next_results"in r["metadata"]。有什么方法可以将这些元数据反弹回 Twython API 并具有以下状态更新,或者我应该until_id从 手动获取下一个"next_results"并执行全新的查询?

4

1 回答 1

14

petrux, "next_results" 与元数据 "max_id" 和 since_id 一起返回,它们应该用于反弹并循环通过时间线,直到我们获得所需数量的推文。

以下是 twitter 上关于如何操作的更新:https ://dev.twitter.com/docs/working-with-timelines

以下是可能有帮助的示例代码。

tweets = []
MAX_ATTEMPTS = 10
COUNT_OF_TWEETS_TO_BE_FETCHED = 500 

for i in range(0,MAX_ATTEMPTS):

    if(COUNT_OF_TWEETS_TO_BE_FETCHED < len(tweets)):
        break # we got 500 tweets... !!

    #----------------------------------------------------------------#
    # STEP 1: Query Twitter
    # STEP 2: Save the returned tweets
    # STEP 3: Get the next max_id
    #----------------------------------------------------------------#

    # STEP 1: Query Twitter
    if(0 == i):
        # Query twitter for data. 
        results = api.search(q="foobar",count='100')
    else:
        # After the first call we should have max_id from result of previous call. Pass it in query.
        results = api.search(q="foobar",include_entities='true',max_id=next_max_id)

    # STEP 2: Save the returned tweets
    for result in results['statuses']:
        tweet_text = result['text']
        tweets.append(tweet_text)


    # STEP 3: Get the next max_id
    try:
        # Parse the data returned to get max_id to be passed in consequent call.
        next_results_url_params = results['search_metadata']['next_results']
        next_max_id = next_results_url_params.split('max_id=')[1].split('&')[0]
    except:
        # No more next pages
        break
于 2014-02-08T09:33:54.080 回答