r - 计算 R 中一组变量中值的出现次数（每行）

Question

假设我有一个包含 10 个数值变量 V1-V10（列）和多行（案例）的数据框。

我希望 R 做的是：对于每种情况，给我一组变量中某个值的出现次数。

例如，数值 99 在 V2、V3、V6 的单行中出现的次数，显然最小值为 0（三个都没有值 99）和最大值为 3（三个都有值 99）。

我真的在寻找与 SPSS 函数等效的功能COUNT：“COUNT创建一个数字变量，对于每种情况，它都会计算变量列表中相同值（或值列表）的出现次数。 ”

我想过table()和图书馆 plyr's count()，但我无法真正弄清楚。首选矢量化计算。非常感谢！

score 5 · Accepted Answer

我认为应该有一种更简单的方法来做到这一点，但我能想到的获得计数表的最佳方法是循环（隐式使用 sapply）数据框中的唯一值。

#Some example data
df <- data.frame(a=c(1,1,2,2,3,9),b=c(1,2,3,2,3,1))
df
#  a b
#1 1 1
#2 1 2
#3 2 3
#4 2 2
#5 3 3
#6 9 1

levels=unique(do.call(c,df)) #all unique values in df
out <- sapply(levels,function(x)rowSums(df==x)) #count occurrences of x in each row
colnames(out) <- levels
out
#     1 2 3 9
#[1,] 2 0 0 0
#[2,] 1 1 0 0
#[3,] 0 1 1 0
#[4,] 0 2 0 0
#[5,] 0 0 2 0
#[6,] 1 0 0 1

score 5 · Accepted Answer

如果您需要计算行中的任何特定单词/字母。

#Let df be a data frame with four variables (V1-V4)
             df <- data.frame(V1=c(1,1,2,1,L),V2=c(1,L,2,2,L),
             V3=c(1,2,2,1,L), V4=c(L, L, 1,2, L))

要计算每行中 L 的数量，只需使用

#This is how to compute a new variable counting occurences of "L" in V1-V4.      
df$count.L <- apply(df, 1, function(x) length(which(x=="L")))

结果将如下所示

> df
  V1 V2 V3 V4 count.L
1  1  1  1 L       1
2  1  L  2 L       2
3  2  2  2  1      0
4  1  2  1  2      0

score 4 · Accepted Answer

尝试

apply(df,MARGIN=1,table)

你df的data.frame. 这将返回与 data.frame 中行数相同长度的列表。列表中的每一项对应data.frame的一行（顺序相同），是一个表格，内容为出现次数，名称为对应的值。

例如：

df=data.frame(V1=c(10,20,10,20),V2=c(20,30,20,30),V3=c(20,10,20,10))
#create a data.frame containing some data
df #show the data.frame
  V1 V2 V3
1 10 20 20
2 20 30 10
3 10 20 20
4 20 30 10
apply(df,MARGIN=1,table) #apply the function table on each row (MARGIN=1)
[[1]]

10 20 
 1  2 

[[2]]

10 20 30 
 1  1  1 

[[3]]

10 20 
 1  2 

[[4]]

10 20 30 
 1  1  1 

#desired result

score 4 · Accepted Answer

这是另一个与 SPSS 中的 COUNT 命令最接近的直接解决方案 -创建一个新变量，针对每种情况（即行）计算给定值或值列表在变量列表中的出现次数。

#Let df be a data frame with four variables (V1-V4)
df <- data.frame(V1=c(1,1,2,1,NA),V2=c(1,NA,2,2,NA),
       V3=c(1,2,2,1,NA), V4=c(NA, NA, 1,2, NA))

 #This is how to compute a new variable counting occurences of value "1" in V1-V4.      
    df$count.1 <- apply(df, 1, function(x) length(which(x==1)))

更新后的数据框包含新变量 count.1，与 SPSS COUNT 命令完全一样。

 > df
      V1 V2 V3 V4 count.1
    1  1  1  1 NA       3
    2  1 NA  2 NA       1
    3  2  2  2  1       1
    4  1  2  1  2       2
    5 NA NA NA NA       0

您可以执行相同的操作来计算值“2”在 V1-V4 中每行出现的次数。请注意，您需要在 df 中选择应用该函数的列（变量）。

df$count.2 <- apply(df[1:4], 1, function(x) length(which(x==2)))

您还可以应用类似的逻辑来计算V1-V4中缺失值的数量。

df$count.na <- apply(df[1:4], 1, function(x) sum(is.na(x)))

最终结果应该正是您想要的：

 > df
      V1 V2 V3 V4 count.1 count.2 count.na
    1  1  1  1 NA       3       0        1
    2  1 NA  2 NA       1       1        2
    3  2  2  2  1       1       3        0
    4  1  2  1  2       2       2        0
    5 NA NA NA NA       0       0        4

该解决方案可以很容易地推广到一系列值。假设我们要计算每行 V1-V4 中出现值 1或2 的次数：

df$count.1or2 <- apply(df[1:4], 1, function(x) sum(x %in% c(1,2)))

score -1 · Accepted Answer

在我努力寻找与CountR 中的 SPSS 类似的内容时，如下所示：

`df <- data.frame(a=c(1,1,NA,2,3,9),b=c(1,2,3,2,NA,1))` #Dummy data with NAs 

`df %>% 
  dplyr::mutate(count = rowSums( #this allows calculate sum across rows
    dplyr::select(., #Slicing on .  
                  dplyr::one_of( #within select use one_of by clarifying which columns your want
                    c('a','b'))), na.rm = T)) #once the columns are specified, that's all you need, na.rm is cherry on top

这就是输出的样子

>df
   a  b count
1  1  1     2
2  1  2     3
3 NA  3     3
4  2  2     4
5  3 NA     3
6  9  1    10

希望能帮助到你：-）

r - 计算 R 中一组变量中值的出现次数（每行）

5 回答 5

Related

Reference