2

我想使用该tidytext包创建一个带有“ngrams”的列。使用以下代码:

library(tidytext)

unnest_tokens(tbl = president_tweets,
              output =  bigrams,
              input = text,
              token = "ngrams", 
              n = 2) 

但是当我运行它时,我收到以下错误消息:

error: unnest_tokens expects all columns of input to be atomic vectors (not lists)

我的text专栏由许多推文组成,其中的行如下所示,并且具有类字符。

president_tweets$text <– c("The United States Senate just passed the biggest in history Tax Cut and Reform Bill. Terrible Individual Mandate (ObamaCare)Repealed. Goes to the House tomorrow morning for final vote. If approved, there will be a News Conference at The White House at approximately 1:00 P.M.", 
    "Congratulations to Paul Ryan, Kevin McCarthy, Kevin Brady, Steve Scalise, Cathy McMorris Rodgers and all great House Republicans who voted in favor of cutting your taxes!", 
    "A  story in the @washingtonpost that I was close to rescinding the nomination of Justice Gorsuch prior to confirmation is FAKE NEWS. I never even wavered and am very proud of him and the job he is doing as a Justice of the U.S. Supreme Court. The unnamed sources dont exist!", 
    "Stocks and the economy have a long way to go after the Tax Cut Bill is totally understood and appreciated in scope and size. Immediate expensing will have a big impact. Biggest Tax Cuts and Reform EVER passed. Enjoy, and create many beautiful JOBS!", 
    "DOW RISES 5000 POINTS ON THE YEAR FOR THE FIRST TIME EVER - MAKE AMERICA GREAT AGAIN!", 
    "70 Record Closes for the Dow so far this year! We have NEVER had 70 Dow Records in a one year period. Wow!"
    )

- - - - -更新: - - - - -

看起来sentimetrorexploratory包引起了冲突。我在没有这些的情况下重新加载了我的包,现在它又可以工作了!

4

1 回答 1

0

嗯,我无法重现您的问题。

library(tidytext)
library(dplyr)

president_tweets <- data_frame(text = c("The United States Senate just passed the biggest in history Tax Cut and Reform Bill. Terrible Individual Mandate (ObamaCare)Repealed. Goes to the House tomorrow morning for final vote. If approved, there will be a News Conference at The White House at approximately 1:00 P.M.", 
                                        "Congratulations to Paul Ryan, Kevin McCarthy, Kevin Brady, Steve Scalise, Cathy McMorris Rodgers and all great House Republicans who voted in favor of cutting your taxes!", 
                                        "A  story in the @washingtonpost that I was close to rescinding the nomination of Justice Gorsuch prior to confirmation is FAKE NEWS. I never even wavered and am very proud of him and the job he is doing as a Justice of the U.S. Supreme Court. The unnamed sources dont exist!", 
                                        "Stocks and the economy have a long way to go after the Tax Cut Bill is totally understood and appreciated in scope and size. Immediate expensing will have a big impact. Biggest Tax Cuts and Reform EVER passed. Enjoy, and create many beautiful JOBS!", 
                                        "DOW RISES 5000 POINTS ON THE YEAR FOR THE FIRST TIME EVER - MAKE AMERICA GREAT AGAIN!", 
                                        "70 Record Closes for the Dow so far this year! We have NEVER had 70 Dow Records in a one year period. Wow!"))


unnest_tokens(tbl = president_tweets,
              output =  bigrams,
              input = text,
              token = "ngrams", 
              n = 2) 
#> # A tibble: 205 x 1
#>    bigrams      
#>    <chr>        
#>  1 the united   
#>  2 united states
#>  3 states senate
#>  4 senate just  
#>  5 just passed  
#>  6 passed the   
#>  7 the biggest  
#>  8 biggest in   
#>  9 in history   
#> 10 history tax  
#> # ... with 195 more rows

当前的 CRAN 版本的 tidytext 实际上不允许列表列,但我们已经更改了列处理,因此 GitHub 上的开发版本现在支持列表列。你确定你的数据框/小标题中没有这些吗?您所有列的数据类型是什么?它们中的任何一个都是 typelist吗?

于 2017-12-20T18:48:27.807 回答