r - 参数的列数不匹配

Question

我正在使用此示例对 R 中的 txt 文档集合进行情感分析。代码为：

library(tm)
library(tidyverse)
library(tidytext)
library(glue)
library(stringr)
library(dplyr)
library(wordcloud)
require(reshape2)

files <- list.files(inputdir,pattern="*.txt")

GetNrcSentiment <- function(file){

    fileName <- glue(inputdir, file, sep = "")
    fileName <- trimws(fileName)
    fileText <- glue(read_file(fileName))
    fileText <- gsub("\\$", "", fileText) 

    tokens <- data_frame(text = fileText) %>% unnest_tokens(word, text)

    # get the sentiment from the first text: 
    sentiment <- tokens %>%
        inner_join(get_sentiments("nrc")) %>% # pull out only sentiment words
        count(sentiment) %>% # count the # of positive & negative words
        spread(sentiment, n, fill = 0) %>% # made data wide rather than narrow
        mutate(sentiment = positive - negative) %>% # positive - negative
        mutate(file = file) %>% # add the name of our file
        mutate(year = as.numeric(str_match(file, "\\d{4}"))) %>% # add the year
        mutate(city = str_match(file, "(.*?).2")[2]) 

    return(sentiment)
}

.txt 文件存储在inputdir并具有名称AB-City.0000，其中 AB 是国家/地区的缩写，City 是城市名称，0000 是年份（范围从 2000 年到 2017 年）。

该函数按预期适用于单个文件，即GetNrcSentiment(files[1])给我一个带有适当计数的小标题。但是，当我尝试为整个集合运行它时，即

nrc_sentiments  <- data_frame()

for(i in files){
    nrc_sentiments <- rbind(nrc_sentiments, GetNrcSentiment(i))
}

我收到以下错误消息：

Joining, by = "word"
Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match

完全相同的代码适用于较长的文档，但在处理较短的文本时会出错。似乎并非所有情绪都在小文档中找到，因此每个文档的列数不同，这可能导致此错误，但我不确定。我将不胜感激有关如何解决问题的任何建议。如果未找到情绪，我希望条目等于零（如果这是我的问题的原因）。

顺便说一句，bing 情绪函数会遍历大约两打文件并给出不同的错误，这似乎指向同一个问题（未找到负面情绪？）：

GetBingSentiment <- function(file){
    fileName <- glue(inputdir, file, sep = "")
    fileName <- trimws(fileName)

    fileText <- glue(read_file(fileName))
    fileText <- gsub("\\$", "", fileText)       
    tokens <- data_frame(text = fileText) %>% unnest_tokens(word, text)

    # get the sentiment from the first text: 
    sentiment <- tokens %>%
        inner_join(get_sentiments("bing")) %>% # pull out only sentiment words
        count(sentiment) %>% # count the # of positive & negative words
        spread(sentiment, n, fill = 0) %>% # made data wide rather than narrow
        mutate(sentiment = positive - negative) %>% 
        mutate(file = file) %>% # add the name of our file
        mutate(year = as.numeric(str_match(file, "\\d{4}"))) %>% # add the year
        mutate(city = str_match(file, "(.*?).2")[2])

    # return our sentiment dataframe
    return(sentiment)
}

Error in mutate_impl(.data, dots) : 
  Evaluation error: object 'negative' not found.

编辑：根据 David Klotz 的建议，我将代码编辑为

for(i in files){ nrc_sentiments <- dplyr::bind_rows(nrc_sentiments, GetNrcSentiment(i)) }

结果，如果未找到来自某种情绪的单词，nrc 不会抛出错误，而是生成 NA，但是在 22 次加入后，我会得到一个不同的错误：

Error in mutate_impl(.data, dots) : Evaluation error: object 'negative' not found.

使用 dplyr 运行 bing 函数时会出现相同的错误。到函数到达第 22 个文档时，两个数据帧都包含所有情绪的列。什么可能导致错误以及如何诊断它？

score 6 · Accepted Answer

dplyr 的bind_rows功能比更灵活rbind，至少在缺少列时：

nrc_sentiments <- dplyr::bind_rows(nrc_sentiments, GetNrcSentiment(i))

score 1 · Accepted Answer

1

输入可能缺少表达式中使用的“负”列

于 2018-06-12T22:30:16.417 回答

r - 参数的列数不匹配

2 回答 2

Related

Reference