我有一个数据集,其中包含对美国公司回报的每月观察。我试图从我的样本中排除所有非 NA 观察次数少于一定数量的公司。
我设法做我想做的事foreach
,但我的数据集非常大,这需要很长时间。这是一个工作示例,它显示了我如何完成我想要的,并希望使我的目标明确
#load required packages
library(data.table)
library(foreach)
#example data
myseries <- data.table(
X = sample(letters[1:6],30,replace=TRUE),
Y = sample(c(NA,1,2,3),30,replace=TRUE))
setkey(myseries,"X") #so X is the company identifier
#here I create another data table with each company identifier and its number
#of non NA observations
nobsmyseries <- myseries[,list(NOBSnona = length(Y[complete.cases(Y)])),by=X]
# then I select the companies which have less than 3 non NA observations
comps <- nobsmyseries[NOBSnona <3,]
#finally I exclude all companies which are in the list "comps",
#that is, I exclude companies which have less than 3 non NA observations
#but I do for each of the companies in the list, one by one,
#and this is what makes it slow.
for (i in 1:dim(comps)[1]){
myseries <- myseries[X != comps$X[i],]
}
我怎样才能更有效地做到这一点?有没有data.table
办法得到相同的结果?