r - R 中非 UTF-8 和 ASCII 字符 twitteR 包的问题

Question

在上一个问题中，我询问了有关使用 R 中的 twitteR 包从 Haaretz Twitter 提要 (@haaretzcom) 下载大量 Twitter 关注者（以及他们的位置、创建日期、关注者数量等）的问题（请参阅 Workaround rate使用 R 中的 twitteR 包提取大量用户信息的限制）。Twitter 提要有超过 90,000 名追随者，我可以使用下面的代码毫无问题地下载完整的追随者列表。

   require(twitteR)
   require(ROAuth)
   #Loading the Twitter OAuthorization
   load("~/Dropbox/Twitter/my_oauth")

   #Confirming the OAuth
   registerTwitterOAuth(my_oauth)

  # opening list to download
  haaretz_followers<-getUser("haaretzcom")$getFollowerIDs(retryOnRateLimit=9999999)

  for (follower in haaretz_followers){
   Sys.sleep(5)
   haaretz_followers_info<-lookupUsers(haaretz_followers)

   haaretz_followers_full<-twListToDF(haaretz_followers_info)

   #Export data to csv
  write.table(haaretz_followers_full, file = "haaretz_twitter_followers.csv",  sep=",")
 }

该代码可用于提取许多用户。但是，每当我点击某个用户时，我都会收到以下错误：

Error in twFromJSON(out) :
RMate stopped at line 51
Error: Malformed response from server, was not JSON.
RMate stopped at line 51
The most likely cause of this error is Twitter returning a character which
can't be properly parsed by R. Generally the only remedy is to wait long
enough for the offending character to disappear from searches (e.g. if
using searchTwitter()).
Calls: twListToDF ... lookupUsers -> lapply -> FUN -> <Anonymous> -> twFromJSON
Execution halted

即使我在 twitteR 包之后加载 RJSONIO 包，我也会遇到这个问题。通过一些研究，似乎 twitteR 和 RJSONIO 包在解析非 UTF-8 或 ASCII 字符（阿拉伯语等）时出现问题http://lists.hexdump.org/pipermail/twitter-users-hexdump.org/ 2013 年 5 月/000335.html。有没有办法简单地忽略我拥有的代码中的非 UTF-8 或 ASCII 并仍然提取所有关注者信息？任何帮助将非常感激。

score 1 · Accepted Answer

有一个包更新 (1.1.7)，解决了这个问题。见：https ://github.com/geoffjentry/twitteR/blob/master/NEWS

r - R 中非 UTF-8 和 ASCII 字符 twitteR 包的问题

1 回答 1

Related

Reference