-1

我在做矩阵计算时遇到问题,请你解释一下。非常感谢您!

我有一个数据框genderLocation和一个矩阵测试,它们与索引相互对应

性别位置[,1:6]

          scanner_gender cmall_gender wechat_gender scanner_location cmall_location wechat_location
    156043              3            2             2             Guangzhou           Shenzhen            Shenzhen
    156044              2           NA            NA             Shenzhen           <NA>                
    156045              2           NA             2             Shenzhen           <NA>            Hongkong
    156046              2           NA             2             Shenzhen           <NA>            Shenzhen

测试

        [,1] [,2] [,3] [,4] [,5] [,6]
    [1,]  0.8  0.7  0.6  0.6  0.7  0.7
    [2,]  0.8  1.0  1.0  0.6  0.7  0.7
    [3,]  0.8  1.0  0.6  0.6  0.7  0.7
    [4,]  0.8  1.0  0.6  0.6  0.7  0.7

现在我想聚合性别位置,计算矩阵测试中相应数字的平均值。以 156043 行为例,结果应该是

      2    3 Guangzhou Shenzhen 
    0.65 0.80 0.60 0.70 

我不知道如何使用 apply 系列来做到这一点(因为不建议在 R 中使用 for 循环)。这似乎是

    > apply(test,1,function(tst,genderLoc) print(tapply(tst,as.character(genderLoc),mean)),genderLocation)

但我无法理解结果,如果仅限于前 2 行,似乎可以理解。

    > apply(test[1:2,],1,function(tst,genderLoc) print(tapply(tst,as.character(genderLoc),mean)),genderLocation[1:2,])
           c("2", NA)       c("3", "2") c("广州", "深圳")     c("深圳", "")     c("深圳", NA) 
                 0.65              0.80              0.60              0.70              0.70 
           c("2", NA)       c("3", "2") c("广州", "深圳")     c("深圳", "")     c("深圳", NA) 
                  1.0               0.8               0.6               0.7               0.7 
                      [,1] [,2]
    c("2", NA)        0.65  1.0
    c("3", "2")       0.80  0.8
    c("广州", "深圳") 0.60  0.6
    c("深圳", "")     0.70  0.7
    c("深圳", NA)     0.70  0.7    
##### 供参考
    test=matrix(c(0.8,0.8,0.8,0.8, 0.7,1,1,1, 0.6,1,0.6,0.6, 0.6,0.6,0.6,0.6, 0.7,0.7,0.7,0.7, 0.7,0.7,0.7,0.7),nrow=4,ncol=6,byrow=F)
    genderLocation<- data.frame(scanner_gender=c(3,2,2,2),cmall_gender=c(2,NA,NA,NA),wechat_gender=c(2,NA,2,2),
                                 scanner_location=c("Guangzhou","Shenzhen","Shenzhen","Shenzhen"),
                                 cmall_location=c("Shenzhen",NA,NA,NA),
                                 wechat_location=c("Shenzhen","","Hongkong","Shenzhen"))
    genderLocation1<-cbind(genderLocation,test)  # binded for some apply functions only accepting one input.
4

1 回答 1

0

以下适用于您的示例数据,但我不知道它对您的所有数据有多稳定df如果您的某些行与其他行不共享公共值,则可能会出现问题。但是,如果您想将输出保留为列表,这应该没有问题(即 skip Reduce...)。牢记这一点...

--您的数据--

test <- matrix(c(0.8,0.8,0.8,0.8,0.7,1,1,1,0.6,1,0.6,0.6,0.6,0.6,0.6,0.6,rep(0.7,8)), nrow=4)

df <- data.frame(scanner_gender=c(3,2,2,2),
             cmall_gender=c(2,NA,NA,NA),
             wechat_location=c(2,NA,2,2),
             scanner_location=c("Guanzhou","Shenzhen","Shenzhen","Shenzhen"),
             cmall_location=c("Shenzhen",NA,NA,NA),
             wechat_location=c("Shenzhen",NA,"Hongkong","Shenzhen"),
             stringsAsFactors=F)
rownames(df) <- c(156043,156044,156045,156046)

- 手术 -

我将mapfrompurrr与其他tidyverse动词结合起来1)df row-entry在第一列和test row-entry第二列中创建一个 2 列数据框, 2)然后filter输出 where is.na(A)==T3)mean然后按组汇总, 4)然后使用(键)spread进入行数据框A作为列

L <- map(1:nrow(df),~data.frame(A=unlist(df[.x,]),B=unlist(test[.x,])) %>% 
              filter(!is.na(A)) %>%
              group_by(A) %>%
              summarise(B=mean(B)) %>%
              spread(A,B) )

Reduce然后,我使用and将此列表简化为数据框full_join

newdf <- Reduce("full_join", L)

- 输出 -

    `2`   `3` Guanzhou Shenzhen Hongkong
1  0.65   0.8      0.6     0.70       NA
2  0.80    NA       NA     0.60       NA
3  0.70    NA       NA     0.60      0.7
4  0.70    NA       NA     0.65       NA
于 2017-07-26T13:49:49.560 回答