r - 在二进制矩阵中每 10 列聚合一次

Question

我是 R 新手。我想像这样转换二进制矩阵：
示例：

"   1874 1875 1876 1877 1878 .... 2009  
F     1     0     0     0     0   ...  0
E     1     1     0     0     0   ...  0
D     1     1     0     0     0   ...  0
C     1     1     0     0     0   ...  0
B     1     1     0     0     0   ...  0
A     1     1     0     0     0   ...  0"

因为，列名是年，我想在几十年内聚合它们并获得类似的东西：

"1840-1849 1850-1859 1860-1869 .... 2000-2009
F     1     0     0     0     0   ...  0
E     1     1     0     0     0   ...  0
D     1     1     0     0     0   ...  0
C     1     1     0     0     0   ...  0
B     1     1     0     0     0   ...  0
A     1     1     0     0     0   ...  0"

我习惯了python，不知道如何在不制作循环的情况下进行这种转换！谢谢，伊莎贝尔

score 2 · Accepted Answer

目前尚不清楚您想要什么聚合，但使用以下虚拟数据

set.seed(42)
df <- data.frame(matrix(sample(0:1, 6*25, replace = TRUE), ncol = 25))
names(df) <- 1874 + 0:24

以下计算每 10 年期间的事件。

将年份作为数字变量获取

years <- as.numeric(names(df))

接下来我们需要一个每个十年开始的指标

ind <- seq(from = signif(years[1], 3), to = signif(tail(years, 1), 3), by = 10)

ind然后我们应用( )的索引1:(length(ind)-1)，从中选择df当前十年的列并1使用计算 s rowSums。

tmp <- lapply(seq_along(ind[-1]),
              function(i, inds, data) {
                rowSums(data[, names(data) %in% inds[i]:(inds[i+1]-1)])
              }, inds = ind, data = df)

接下来，我们cbind将生成的向量放入数据框中并修复列名：

out <- do.call(cbind.data.frame, tmp)
names(out) <- paste(head(ind, -1), tail(ind, -1) - 1, sep = "-")
out

这给出了：

> out
  1870-1879 1880-1889 1890-1899
1         4         5         6
2         4         6         6
3         2         5         5
4         5         5         7
5         3         3         7
6         5         5         4

如果您只想要一个二进制矩阵，其中1表明该十年内发生了至少 1 个事件，那么您可以使用：

tmp2 <- lapply(seq_along(ind[-1]),
               function(i, inds, data) {
                 as.numeric(rowSums(data[, names(data) %in% inds[i]:(inds[i+1]-1)]) > 0)
               }, inds = ind, data = df)
out2 <- do.call(cbind.data.frame, tmp2)
names(out2) <- paste(head(ind, -1), tail(ind, -1) - 1, sep = "-")
out2

这使：

> out2
  1870-1879 1880-1889 1890-1899
1         1         1         1
2         1         1         1
3         1         1         1
4         1         1         1
5         1         1         1
6         1         1         1

如果您想要不同的聚合，请修改lapply调用中应用的函数以使用rowSums.

score 1 · Accepted Answer

This is another option, using modular arithmetic to aggregate the columns.

# setup, borrowed from @GavinSimpson
set.seed(42)
df <- data.frame(matrix(sample(0:1, 6*25, replace = TRUE), ncol = 25))
names(df) <- 1874 + 0:24

result <- do.call(cbind, 
    by(t(df), as.numeric(names(df)) %/% 10 * 10, colSums))

# add -xxx9 to column names, for each decade
dimnames(result)[[2]] <- paste(colnames(result), as.numeric(colnames(result)) + 9, sep='-')

#    1870-1879 1880-1889 1890-1899
# V1         4         5         6
# V2         4         6         6
# V3         2         5         5
# V4         5         5         7
# V5         3         3         7
# V6         5         5         4

If you wanted to aggregate with something other than sum, replace the call to colSums with something like function(cols) lapply(cols, f), where f is the aggregating function, e.g., max.

r - 在二进制矩阵中每 10 列聚合一次

2 回答 2

Related

Reference