r - 如何在 R 中计算二进制矩阵

Question

这是我的问题，我无法全部解决。

假设我们有如下代码：

## A data frame named a    
a <- data.frame(A = c(0,0,1,1,1), B = c(1,0,1,0,0), C = c(0,0,1,1,0), D = c(0,0,1,1,0), E = c(0,1,1,0,1))
## 1st function calculates all the combinaisons of colnames of a and the output is a character vector named item2
items2 <- c()
countI <- 1 
while(countI <= ncol(a)){
        for(i in countI){
                countJ <- countI + 1
                while(countJ <= ncol(a)){
                        for(j in countJ){
                                items2 <- c(items2, paste(colnames(a[i]), colnames(a[j]), collapse = '', sep = ""))
                        }
                        countJ <- countJ + 1
                }
                countI <- countI + 1
        }
}

这是我要解决的代码（输出是一个名为count_1的数字向量）：

## 2nd function
colnames(a) <- NULL ## just for facilitating the calculation
count_1 <- numeric(ncol(a)*2)
countI <- 1
while(countI <= ncol(a)){
        for(i in countI){
                countJ <- countI + 1
                while(countJ <= ncol(a)){
                        for(j in countJ){
                                s <- a[, i]
                                p <- a[, j]
                                count_1[i*2] <- as.integer(s[i] == p[j] & s[i] == 1)
                        }
                        countJ <- countJ + 1
                }
                countI <- countI + 1
        }
}

但是当我在 RStudio 控制台中执行这段代码时，返回了一个非预期的结果！：

 count_1
 [1] 0 0 0 0 0 1 0 1 0 0

但是，我期待以下结果：

count_1
[1] 1 2 2 2 1 1 1 1 2 1

您可以访问以下 URL，您可以在 Dropbox 上找到图片以获取详细说明。 https://www.dropbox.com/s/5ylt8h8wx3zrvy7/IMAG1074.jpg?dl=0

我将尝试解释更多，我发布了第一个函数（代码）只是为了向您展示我正在寻找的内容，这就是一个示例。我试图从第二个函数（代码）中得到的是计算每行中数字1（首先我们放counter = 0）的出现次数（而两列的每一行（例如 AB）必须在两者中都等于一个列说counter = counter + 1）我们继续通过所有其他列（AC，AD，AE，BC，BD，BE，CD，CE，然后是DE）组合每一列，组合是n!/2!(n-2)!，例如，如果我有以下数据框：

一个=

ABCDE
0 1 0 0 0

0 0 0 0 1

1 1 1 1 1

1 0 0 1 0

1 0 1 0 1

那么，将前两列组合起来，每一行出现数字1的次数如下：（注意我放colnames(a) <- NULL的只是为了方便工作，更清楚）

0 1 0 0 0

0 0 0 0 1

1 1 1 1 1

1 0 0 1 0

1 0 1 0 1

### Example 1: #####################################################

所以从这里我放（对于A和B（AB）列）

s <- a[, i]
## s is equal to
## [1] 0 0 1 1 1
p <- a[, j]
## p is equal to
## [1] 1 0 1 0 0

然后我将在两个向量中查找数字1的出现，条件是它必须相同，即a[, i] == 1 && a[, j] == 1 && a[, i] == a[, j]，对于这个例子，一个数字向量将是[1] 1

### Example 2: #####################################################

从这里我放（对于A和D（AD）列）

s <- a[, i]
## s is equal to
## [1] 0 0 1 1 1
p <- a[, j]
## p is equal to
## [1] 0 0 1 1 0

然后我将在两个向量中查找数字1的出现，条件是它必须相同，即a[, i] == 1 && a[, j] == 1 && a[, i] == a[, j]，对于这个例子，一个数字向量将是[1] 2

依此类推，我将有一个名为count_1等于的数值向量：

[1] 1 2 2 2 1 1 1 1 2 1

而每个索引count_1是其他列的组合（没有数据框的名称）

AB AC AD AE BC BD BE CD CE DE

1 2 2 2 1 1 1 1 2 1

score 0 · Accepted Answer

根本不清楚你在做什么。

至于第一个代码块，那是一些丑陋的 R 编码，涉及一大堆不必要的while/for循环。

items2您可以在一行中获得相同的结果。

items2 <- sort(toupper(unlist(sapply(1:4, function(i)
    sapply(5:(i+1), function(j)
        paste(letters[i], letters[j], sep = ""))))));
items2;
# [1] "AB" "AC" "AD" "AE" "BC" "BD" "BE" "CD" "CE" "DE"

至于第二个代码块，请解释您要计算的内容。这些while/for循环很可能与第一种情况一样不必要。

更新

请注意，这是基于a您帖子开头所定义的。您的预期输出基于不同a的，您在帖子中进一步更改。

不需要for/while循环，两个“函数”都可以写在两个单行中。

# Your sample dataframe a
a <- data.frame(A = c(0,0,1,1,1), B = c(1,0,1,0,0), C = c(0,0,1,1,0), D = c(0,0,1,1,0), E = c(0,1,1,0,1))

# Function 1
items2 <- toupper(unlist(sapply(1:(ncol(a) - 1), function(i) sapply(ncol(a):(i+1), function(j)
        paste(letters[i], letters[j], sep = "")))));
# Function 2
count_1 <- unlist(sapply(1:(ncol(a) - 1), function(i) sapply(ncol(a):(i+1), function(j)
        sum(a[, i] + a[, j] == 2))));

# Add names and sort
names(count_1) <- items2;
count_1 <- count_1[order(names(count_1))];
# Output
count_1;
#AB AC AD AE BC BD BE CD CE DE
# 1  2  2  2  1  1  1  2  1  1

r - 如何在 R 中计算二进制矩阵

1 回答 1

更新

Related

Reference