0

我有一个功能可以在数据集中标记为垃圾邮件字符串。我通过调用成功地使用了这个函数:

dtm_english.label <- getSpamLabel(comment$rawMessage, dictionary_english, 2) # 2 is the threshold level

但是当我打电话时

dtm_english.label <- ddply(comment, .(rawMessage), getSpamLabel, dictionary_english, 2, .progress = "text")

在 ddply 完成后没有任何输出我得到的任务

Error in do.call("c", res) : variable names are limited to 10000 bytes

如果相关,我可以发布功能

4

1 回答 1

2

I am not sure what you are attempting to do, next time please describe exactly what you are trying to achieve. To me it looks like you are trying to apply a function to one column of your data.frame. ddply is meant to be used to apply a function to subsets of the data. It is described as "Split data frame, apply function, and return results in a data frame".

If what you want to do is split your column into sections before applying the function, you would need for example a factor in your dataframe to tag the groups.

You would use the "group" factor in the .variable argument to ddply, not the variable to which you would like to apply the function, FUN=summarize, and then your function call.

dtm_english <- ddply(comment, .(group), summarize, 
                     label=getSpamLabel(rawMessage, dictionary_english, 2), 
                     .progress = "text")

This will give as output a new dataframe with a row for each level of group.

于 2013-09-03T16:57:32.793 回答