2

我有一个字符/数字组合矩阵,我需要删除那些列的两行中都出现相同字符的列。对于一个简化的例子:

> chars <- c("A1","A2","B1","B2")
> charsmat <- combn(chars, 2)
> charsmat
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] "A1" "A1" "A1" "A2" "A2" "B1"
[2,] "A2" "B1" "B2" "B1" "B2" "B2"

当单列的两行包含相同的字符(在本例中为第 1 列和第 6 列)时,我需要删除该列。我觉得我已经掌握了这些部分:使用gsub()str_extract()隔离字符,并测试行之间是否存在匹配,但我不知道如何制定它。提前感谢您提供的任何帮助。

4

2 回答 2

3

首先,创建一个仅提取字母部分的新矩阵:

> (charsmat.alpha <- substr(charsmat, 0, 1))
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,] "A"  "A"  "A"  "A"  "A"  "B" 
[2,] "A"  "B"  "B"  "B"  "B"  "B"

然后,获取charsmat其中两行charsmat.alpha不相同的列子集:

> charsmat[,(charsmat.alpha[1,] != charsmat.alpha[2,])]
     [,1] [,2] [,3] [,4]
[1,] "A1" "A1" "A2" "A2"
[2,] "B1" "B2" "B1" "B2"
于 2012-07-17T19:21:54.800 回答
1

这是一个更通用的解决方案,它将删除第 1 行条目中的任何字母与第 2 行条目中的任何字母 匹配的列:

## Your data
chars <- c("A1","A2","B1","B2")
charsmat <- combn(chars, 2)

vetMatrix <- function(mat) {
    ## Remove non-alpha characters from matrix entries
    mm <- gsub("[^[:alpha:]]", "", mat)    
    ## Construct character class regex patterns from first row
    patterns <- paste0("[", mm[1,], "]")
    xs <- mm[2,]    
    ## Extract columns in which no character in first row is found in second
    mat[,!mapply("grepl", patterns, xs), drop=FALSE]
}

## Try it with your matrix ...
vetMatrix(charsmat)
#      [,1] [,2] [,3] [,4]
# [1,] "A1" "A1" "A2" "A2"
# [2,] "B1" "B2" "B1" "B2"

## ... and with a different matrix
mat <- matrix(c("AB1", "B1", "AA11", "BB22", "this", "that"), ncol=3) 
mat
#      [,1]  [,2]   [,3]  
# [1,] "AB1" "AA11" "this"
# [2,] "B1"  "BB22" "that"
vetMatrix(mat)
#     [,1]  
# [1,] "AA11"
# [2,] "BB22"
于 2012-07-17T19:33:54.817 回答