3

我有一个数据列表,表明参加这样的会议:

Event                     Participant  
ConferenceA               John   
ConferenceA               Joe  
ConferenceA               Mary    
ConferenceB               John  
ConferenceB               Ted  
ConferenceC               Jessica  

我想创建以下格式的二进制指标考勤矩阵:

Event        John  Joe  Mary  Ted  Jessica  
ConferenceA  1     1    1     0    0  
ConferenceB  1     0    0     1    0  
ConferenceC  0     0    0     0    1  

有没有办法在 R 中做到这一点?

4

3 回答 3

11

假设你data.frame被称为“mydf”,只需使用table

> table(mydf)
             Participant
Event         Jessica Joe John Mary Ted
  ConferenceA       0   1    1    1   0
  ConferenceB       0   0    1    0   1
  ConferenceC       1   0    0    0   0

如果有人有可能多次参加会议,导致table返回大于 1 的值,您可以简单地将所有大于 1 的值重新编码为 1,如下所示。

temp <- table(mydf)
temp[temp > 1] <- 1

请注意,这将返回一个table. 如果您希望 adata.frame被退回,请使用as.data.frame.matrix

> as.data.frame.matrix(table(mydf))
            Jessica Joe John Mary Ted
ConferenceA       0   1    1    1   0
ConferenceB       0   0    1    0   1
ConferenceC       1   0    0    0   0

在上面,“mydf”定义为:

mydf <- structure(list(Event = c("ConferenceA", "ConferenceA", 
  "ConferenceA", "ConferenceB", "ConferenceB", "ConferenceC"), 
  Participant = c("John", "Joe", "Mary", "John", "Ted", "Jessica")), 
  .Names = c("Event", "Participant"), class = "data.frame", 
  row.names = c(NA, -6L))

请在未来以类似的方式分享您的数据。

于 2013-07-02T17:02:16.663 回答
1

@Ananda 的答案要好得多,但我想我会使用qdap提出另一种方法。它仅在“有人会多次参加会议”的情况下才会发光。

正如阿南达所指出的,我包括了一个“有人会不止一次参加会议”的例子。在这种情况下,使用该adjmat函数并提取布尔矩阵可能会有所帮助。

双重参加者的数据:

## dat <- read.table(text="Event                     Participant  
## ConferenceA               John   
## ConferenceA               Joe  
## ConferenceA               Mary    
## ConferenceB               John  
## ConferenceB               Ted  
## ConferenceB               Ted
## ConferenceC               Jessica  ", header=TRUE)

计数表:

library(qdap)
wfm(dat[, 1], dat[, 2], lower.case = FALSE)

## > wfm(dat[, 1], dat[, 2], lower.case = FALSE)
##             Jessica Joe John Mary Ted
## conferenceA       0   1    1    1   0
## conferenceB       0   0    1    0   2
## conferenceC       1   0    0    0   0

使用 mtabulate

with(dat, mtabulate(split(Participant, Event)))

##             Jessica Joe John Mary Ted
## ConferenceA       0   1    1    1   0
## ConferenceB       0   0    1    0   2
## ConferenceC       1   0    0    0   0

一个布尔矩阵:

adjmat(wfm(dat[, 1], dat[, 2], lower.case = FALSE))$boolean

## > adjmat(wfm(dat[, 1], dat[, 2], lower.case = FALSE))$boolean
##             Jessica Joe John Mary Ted
## conferenceA       0   1    1    1   0
## conferenceB       0   0    1    0   1
## conferenceC       1   0    0    0   0
于 2013-07-02T17:45:02.160 回答
0

另一种 baseR 方式,使用函数xtabs

xtabs(~mydf$Event+mydf$Participant)

             mydf$Participant
mydf$Event    Jessica Joe John Mary Ted
  ConferenceA       0   1    1    1   0
  ConferenceB       0   0    1    0   1
  ConferenceC       1   0    0    0   0

#using data
mydf <- structure(list(Event = c("ConferenceA", "ConferenceA", 
                                 "ConferenceA", "ConferenceB", "ConferenceB", "ConferenceC"), 
                       Participant = c("John", "Joe", "Mary", "John", "Ted", "Jessica")), 
                  .Names = c("Event", "Participant"), class = "data.frame", 
                  row.names = c(NA, -6L))
于 2021-02-25T15:13:59.977 回答