0

我对 R 相当熟悉,但已经达到了我的数据需求要求我学习具有多个条件的迭代循环的地步。我已经看到使用各种形式的 *apply() 以及 colSums() 和 rowSums() 用于执行我需要的数据转换类型的示例,但我想提高这些任务的效率,也许嵌套或迭代循环。此外,现有建议没有考虑到忽略/删除“NA”项目导致的数据丢失,我需要能够保留这些信息。

我的一般数据格式如下:

group <- c("A", "B", "C", "A", "C" [...])

individual <- c("1", "2", "3", "4", "5" [...])

choice1 <- c("1", "0", "1", "1", "NA")

choice2 <- c("1", "NA", "1", "0", "NA")

[...]

choice10 <- c("1", "0", "1", "1", "NA")

我需要计算三个选项中每一个的计数;1==是;0==否;NA==跨选项和跨组选择退出,然后将这些转换为百分比。我在以前的方法(如 *apply() 或跨行/列求和)中遇到的最大困难是我的“NA”值(选择退出)被忽略,或者阻止我能够充分地跨组获取选择值的百分比. 任何关于如何在循环结构中忽略或保留“退出”/NA 的具体建议或演示将不胜感激。

输出看起来有点像以下:yes.count_bychoice

no.count_bychoice

optout.count_bychoice

percentyes_bychoice_bygroup

percentno_bychoice_bygroup

percentout_bychoice_bygroup
4

2 回答 2

1

第一件事。构建一个data.frame. 像这样:

d <- data.frame(group=group, individual=individual, choice1=choice1 ...)

我将以此为例:

d <- data.frame(group=sample(LETTERS[1:4],20,T), individual=1:20,
choice1=sample(c(0,1,NA),20,T), choice2=sample(c(0,1,NA),20,T))

我明白了

> head(d)
  group individual choice1 choice2
1     D          1       1      NA
2     A          2      NA      NA
3     C          3       1       1
4     A          4       1      NA
5     B          5       0      NA
6     B          6       1       1

我们将使用以下函数:

f <- function(x) c(yes=sum(x==1,na.rm=TRUE),no=sum(x==0,na.rm=TRUE),optout=sum(is.na(x)))

用于计数和

g <- function(x) f(x)/length(x)

百分比。

对于全局计数,您可以使用:

counts <- apply(d[,-(1:2)], 2, FUN=f)

结果:

> counts
       choice1 choice2
yes         11       8
no           4       2
optout       5      10

更改您获得百分比的功能:

> apply(d[,-(1:2)], 2, FUN=g)
       choice1 choice2
yes       0.55     0.4
no        0.20     0.1
optout    0.25     0.5

要获得每个选择的每组计数,您可以使用以下命令:

counts_grp <- aggregate(d[,-(1:2)], by=list(group=d$group), FUN=f)

结果:

> counts_grp
  group choice1.yes choice1.no choice1.optout choice2.yes choice2.no choice2.optout
1     A           1          0              3           2          0              2
2     B           3          2              0           3          1              1
3     C           4          0              2           3          0              3
4     D           3          2              0           0          1              4

对于百分比,您可以简单地切换功能:

> aggregate(d[,-(1:2)], by=list(group=d$group), FUN=g)
  group choice1.yes choice1.no choice1.optout choice2.yes choice2.no choice2.optout
1     A   0.2500000  0.0000000      0.7500000         0.5        0.0            0.5
2     B   0.6000000  0.4000000      0.0000000         0.6        0.2            0.2
3     C   0.6666667  0.0000000      0.3333333         0.5        0.0            0.5
4     D   0.6000000  0.4000000      0.0000000         0.0        0.2            0.8
于 2013-04-12T14:36:42.820 回答
0

对于快速而肮脏的东西,您可能想尝试查看aggregateprop.table像这样:

#Some data:
df <- data.frame( group = c("A", "B", "C", "A", "C" ) , 
individual = c("1", "2", "3", "4", "5" ),
choice1 = c("1", "0", "1", "1", "NA"),
choice2 = c("1", "NA", "1", "0", "NA") ,
choice3 = c("1", "NA", "NA", "0", "NA") )

#Convert to ordered factor to keep order of values as 0<1<NA in all cases, no matter the order they appear in a column
df <- as.data.frame( lapply( df , factor , order = TRUE ) )

#Then aggregate by group and choice, and work out proportion of each response
# Order of values is 0, then 1, then NA
# But if there are choices with missing values it won't be very good because it isn't labelled which values are which, but if all choices have at least one value in each category then first value will be proportion of 0, next will be proportion of 1's and finally proportion of NAs
aggregate( cbind( choice1 , choice2 , choice3 ) ~ group  , data = df , prop.table )

#group  choice1              choice2              choice3
#1     A 0.5, 0.5 0.6666667, 0.3333333 0.6666667, 0.3333333
#2     B        1                    1                    1
#3     C 0.4, 0.6             0.4, 0.6             0.5, 0.5
于 2013-04-12T14:30:44.980 回答