11

我正在将 twitter 搜索结果保存到数据库(SQL Server)中,当我从 twitteR 中提取搜索结果时出现错误。

如果我执行:

library(twitteR)
puppy <- as.data.frame(searchTwitter("puppy", session=getCurlHandle(),num=100))

我收到以下错误:

Error in as.data.frame.default(x[[i]], optional = TRUE) : 
  cannot coerce class structure("status", package = "twitteR") into a data.frame

这很重要,因为为了使用 RODBC 将其添加到使用 sqlSave 的表中,它需要是一个 data.frame。至少这是我得到的错误信息:

Error in sqlSave(localSQLServer, puppy, tablename = "puppy_staging",  : 
  should be a data frame

那么有人对如何将列表强制为 data.frame 或如何通过 RODBC 加载列表有任何建议吗?

我的最终目标是创建一个能反映 searchTwitter 返回值结构的表。这是我尝试检索和加载的示例:

library(twitteR)
puppy <- searchTwitter("puppy", session=getCurlHandle(),num=2)
str(puppy)

List of 2
 $ :Formal class 'status' [package "twitteR"] with 10 slots
  .. ..@ text        : chr "beautifull and  kc reg Beagle Mix for rehomes: This little puppy is looking for a new loving family wh... http://bit.ly/9stN7V "| __truncated__
  .. ..@ favorited   : logi FALSE
  .. ..@ replyToSN   : chr(0) 
  .. ..@ created     : chr "Wed, 16 Jun 2010 19:04:03 +0000"
  .. ..@ truncated   : logi FALSE
  .. ..@ replyToSID  : num(0) 
  .. ..@ id          : num 1.63e+10
  .. ..@ replyToUID  : num(0) 
  .. ..@ statusSource: chr "&lt;a href=&quot;http://twitterfeed.com&quot; rel=&quot;nofollow&quot;&gt;twitterfeed&lt;/a&gt;"
  .. ..@ screenName  : chr "puppy_ads"
 $ :Formal class 'status' [package "twitteR"] with 10 slots
  .. ..@ text        : chr "the cutest puppy followed me on my walk, my grandma won't let me keep it. taking it to the pound sadface"
  .. ..@ favorited   : logi FALSE
  .. ..@ replyToSN   : chr(0) 
  .. ..@ created     : chr "Wed, 16 Jun 2010 19:04:01 +0000"
  .. ..@ truncated   : logi FALSE
  .. ..@ replyToSID  : num(0) 
  .. ..@ id          : num 1.63e+10
  .. ..@ replyToUID  : num(0) 
  .. ..@ statusSource: chr "&lt;a href=&quot;http://blackberry.com/twitter&quot; rel=&quot;nofollow&quot;&gt;Twitter for BlackBerry®&lt;/a&gt;"
  .. ..@ screenName  : chr "iamsweaters"

所以我认为小狗的 data.frame 应该有如下列名:

- text
- favorited
- replytoSN
- created
- truncated
- replytoSID
- id
- replytoUID
- statusSource
- screenName
4

6 回答 6

18

我使用刚才从http://blog.ouseful.info/2011/11/09/getting-started-with-twitter-analysis-in-r/找到的这段代码:

#get data
tws<-searchTwitter('#keyword',n=10)

#make data frame
df <- do.call("rbind", lapply(tws, as.data.frame))

#write to csv file (or your RODBC code)
write.csv(df,file="twitterList.csv")
于 2011-12-08T15:07:40.660 回答
7

我知道这是一个老问题,但我认为这是一个“现代”版本来解决这个问题。只需使用该功能twListToDf

gvegayon <- getUser("gvegayon")
timeline <- userTimeline(gvegayon,n=400)
tl <- twListToDF(timeline)

希望能帮助到你

于 2015-05-24T20:27:03.380 回答
3

试试这个:

ldply(searchTwitter("#rstats", n=100), text)

twitteR 返回一个 S4 类,因此您需要使用它的一个辅助函数,或者直接处理它的插槽。您可以使用 来查看插槽unclass(),例如:

unclass(searchTwitter("#rstats", n=100)[[1]])

这些插槽可以通过使用相关函数(来自 twitteR 帮助:?statusSource)直接访问,就像我上面做的那样:

 text Returns the text of the status
 favorited Returns the favorited information for the status
 replyToSN Returns the replyToSN slot for this status
 created Retrieves the creation time of this status
 truncated Returns the truncated information for this status
 replyToSID Returns the replyToSID slot for this status
 id Returns the id of this status
 replyToUID Returns the replyToUID slot for this status
 statusSource Returns the status source for this status

正如我所提到的,我的理解是您必须自己在输出中指定每个字段。这是一个使用两个字段的示例:

> head(ldply(searchTwitter("#rstats", n=100), 
        function(x) data.frame(text=text(x), favorited=favorited(x))))
                                                                                                                                          text
1                                                     @statalgo how does that actually work? does it share mem between #rstats and postgresql?
2                                   @jaredlander Have you looked at PL/R? You can call #rstats from PostgreSQL: http://www.joeconway.com/plr/.
3   @CMastication I was hoping for a cool way to keep data in a DB and run the normal #rstats off that. Maybe a translator from R to SQL code.
4                     The distribution of online data usage: AT&amp;T has recently announced it will no longer http://goo.gl/fb/eTywd #rstat
5 @jaredlander not that I know of. Closest is sqldf package which allows #rstats and sqlite to share mem so transferring from DB to df is fast
6 @CMastication Can #rstats run on data in a DB?Not loading it in2 a dataframe or running SQL cmds but treating the DB as if it wr a dataframe
  favorited
1     FALSE
2     FALSE
3     FALSE
4     FALSE
5     FALSE
6     FALSE

如果你打算经常这样做,你可以把它变成一个函数。

于 2010-06-16T18:39:34.680 回答
1

对于那些遇到同样问题的人,我遇到了一个错误说

Error in as.double(y) : cannot coerce type 'S4' to vector of type 'double' 

我只是将单词文本更改为

ldply(searchTwitter("#rstats", n=100), text) 

到 statusText,像这样:

ldply(searchTwitter("#rstats", n=100), statusText)

只是一个友好的提示:P

于 2012-12-04T04:38:51.133 回答
0

这是一个将其转换为 DF 的好函数。

TweetFrame<-function(searchTerm, maxTweets)
{
  tweetList<-searchTwitter(searchTerm,n=maxTweets)
  return(do.call("rbind",lapply(tweetList,as.data.frame)))
}

将其用作:

tweets <- TweetFrame(" ", n)
于 2016-10-15T06:08:34.977 回答
0

twitteR软件包现在包含一个可以twListToDF为您执行此操作的功能。

puppy_table <- twListToDF(puppy)
于 2018-07-13T19:12:37.787 回答