我正在尝试从特定用户那里获取所有推文:
def get_all_tweets(user_id, DEBUG):
# Your bearer token here
t = Twarc2(bearer_token="blah")
# Initialize a list to hold all the tweepy Tweets
alltweets = []
new_tweets = {}
if DEBUG:
# Debug: read from file
f = open('tweets_debug.txt',)
new_tweets = json.load(f)
alltweets.extend(new_tweets)
else:
# make initial request for most recent tweets (3200 is the maximum allowed count)
new_tweets = t.timeline(user=user_id)
# save most recent tweets
alltweets.extend(new_tweets)
if DEBUG:
# Debug: write to file
f = open("tweets_debug.txt", "w")
f.write(json.dumps(alltweets, indent=2, sort_keys=False))
f.close()
# Save the id of the oldest tweet less one
oldest = str(int(alltweets[-1]['meta']['oldest_id']) - 1)
# Keep grabbing tweets until there are no tweets left to grab
while len(dict(new_tweets)) > 0:
print(f"getting tweets before {oldest}")
# All subsiquent requests use the max_id param to prevent duplicates
new_tweets = t.timeline(user=user_id,until_id=oldest)
# Save most recent tweets
alltweets.extend(new_tweets)
# Update the id of the oldest tweet less one
oldest = str(int(alltweets[-1]['meta']['oldest_id']) - 1)
print(f"...{len(alltweets)} tweets downloaded so far")
res = []
for tweetlist in alltweets:
res.extend(tweetlist['data'])
f = open("output.txt", "w")
f.write(json.dumps(res, indent=2, sort_keys=False))
f.close()
return res
但是,len(dict(new_tweets))
不起作用。它总是返回 0。sum(1 for dummy in new_tweets)
也返回 0。
我试过json.load(new_tweets)
了,它也不起作用。
但是,alltweets.extend(new_tweets)
工作正常。
似乎timeline()
返回了一个生成器类型的值(<generator object Twarc2._timeline at 0x000001D78B3D8B30>
)。有什么方法可以计算它的长度以确定是否还有更多未抓取的推文?
或者,有什么方法可以合并...
someList = []
someList.extend(new_tweets)
while len(someList) > 0:
# blah blah
while
...与?成一条线
编辑:我在 while 循环之前尝试过print(list(new_tweets))
,它返回[]
. 看起来对象实际上是空的。
是因为alltweets.extend(new_tweets)
以某种方式消耗了 new_tweets 生成器......?