r - Work around rate limit for extracting large list of user information using twitteR package in R

Question

I am attempting to download all of the followers and their information (location, date of creation, etc.) from the Haaretz Twitter feed (@haaretzcom) using the twitteR package in R. The Twitter feed has over 90,000 followers I was able to download the full list of followers no problem using the code below.

require(twitteR)
require(ROAuth)
#Loading the Twitter OAuthorization
load("~/Dropbox/Twitter/my_oauth")

#Confirming the OAuth
registerTwitterOAuth(my_oauth)

# opening list to download
haaretz_followers<-getUser("haaretzcom")$getFollowerIDs(retryOnRateLimit=9999999)

However, when I try to extract their information using the lookupUsers function, I run into the rate limit. The trick of using retryOnRateLimit does not seem to work here:)

 #Extracting user information for each of Haaretz followers
 haaretz_followers_info<-lookupUsers(haaretz_followers)

 haaretz_followers_full<-twListToDF(haaretz_followers_info)

 #Export data to csv
 write.table(haaretz_followers_full, file = "haaretz_twitter_followers.csv",  sep=",")

I believe I need to write a for loop and subsample over the list of followers (haaretz_followers) to avoid the rate limit. In this loop, I need to include some kind of rest/pause like Keep downloading tweets within the limits using twitteR package. The twitteR package is a bit opaque on how to go about this and I am bit of a novice writing for loops in R. Finally, I know that depending on how you write your loops in R, greatly affects the run time. Any help you could give would be much appreciated!

score 2 · Accepted Answer

这样的事情可能会完成工作：

for (follower in haaretz_followers){
  Sys.sleep(5)
  haaretz_followers_info<-lookupUsers(haaretz_followers)

  haaretz_followers_full<-twListToDF(haaretz_followers_info)

  #Export data to csv
  write.table(haaretz_followers_full, file = "haaretz_twitter_followers.csv",  sep=",")
}

在这里，您在每次通话之间睡 5 秒。我不知道你的速率限制是什么——你可能需要或多或少地遵守 Twitter 的政策。

你是正确的，你在 R 中构造循环的方式会影响性能，但在这种情况下，你故意插入一个暂停，这将比设计不良的循环浪费的 CPU 时间长几个数量级，所以你不要真的不需要担心。

r - Work around rate limit for extracting large list of user information using twitteR package in R

1 回答 1

Related

Reference