2

我是 R 新手。我想像这样转换二进制矩阵:
示例:

"   1874 1875 1876 1877 1878 .... 2009  
F     1     0     0     0     0   ...  0
E     1     1     0     0     0   ...  0
D     1     1     0     0     0   ...  0
C     1     1     0     0     0   ...  0
B     1     1     0     0     0   ...  0
A     1     1     0     0     0   ...  0"

因为,列名是年,我想在几十年内聚合它们并获得类似的东西:

"1840-1849 1850-1859 1860-1869 .... 2000-2009
F     1     0     0     0     0   ...  0
E     1     1     0     0     0   ...  0
D     1     1     0     0     0   ...  0
C     1     1     0     0     0   ...  0
B     1     1     0     0     0   ...  0
A     1     1     0     0     0   ...  0"

我习惯了python,不知道如何在不制作循环的情况下进行这种转换!谢谢,伊莎贝尔

4

2 回答 2

2

目前尚不清楚您想要什么聚合,但使用以下虚拟数据

set.seed(42)
df <- data.frame(matrix(sample(0:1, 6*25, replace = TRUE), ncol = 25))
names(df) <- 1874 + 0:24

以下计算每 10 年期间的事件。

将年份作为数字变量获取

years <- as.numeric(names(df))

接下来我们需要一个每个十年开始的指标

ind <- seq(from = signif(years[1], 3), to = signif(tail(years, 1), 3), by = 10)

ind然后我们应用( )的索引1:(length(ind)-1),从中选择df当前十年的列并1使用 计算 s rowSums

tmp <- lapply(seq_along(ind[-1]),
              function(i, inds, data) {
                rowSums(data[, names(data) %in% inds[i]:(inds[i+1]-1)])
              }, inds = ind, data = df)

接下来,我们cbind将生成的向量放入数据框中并修复列名:

out <- do.call(cbind.data.frame, tmp)
names(out) <- paste(head(ind, -1), tail(ind, -1) - 1, sep = "-")
out

这给出了:

> out
  1870-1879 1880-1889 1890-1899
1         4         5         6
2         4         6         6
3         2         5         5
4         5         5         7
5         3         3         7
6         5         5         4

如果您只想要一个二进制矩阵,其中1表明该十年内发生了至少 1 个事件,那么您可以使用:

tmp2 <- lapply(seq_along(ind[-1]),
               function(i, inds, data) {
                 as.numeric(rowSums(data[, names(data) %in% inds[i]:(inds[i+1]-1)]) > 0)
               }, inds = ind, data = df)
out2 <- do.call(cbind.data.frame, tmp2)
names(out2) <- paste(head(ind, -1), tail(ind, -1) - 1, sep = "-")
out2

这使:

> out2
  1870-1879 1880-1889 1890-1899
1         1         1         1
2         1         1         1
3         1         1         1
4         1         1         1
5         1         1         1
6         1         1         1

如果您想要不同的聚合,请修改lapply调用中应用的函数以使用rowSums.

于 2013-03-28T14:19:18.003 回答
1

This is another option, using modular arithmetic to aggregate the columns.

# setup, borrowed from @GavinSimpson
set.seed(42)
df <- data.frame(matrix(sample(0:1, 6*25, replace = TRUE), ncol = 25))
names(df) <- 1874 + 0:24

result <- do.call(cbind, 
    by(t(df), as.numeric(names(df)) %/% 10 * 10, colSums))

# add -xxx9 to column names, for each decade
dimnames(result)[[2]] <- paste(colnames(result), as.numeric(colnames(result)) + 9, sep='-')

#    1870-1879 1880-1889 1890-1899
# V1         4         5         6
# V2         4         6         6
# V3         2         5         5
# V4         5         5         7
# V5         3         3         7
# V6         5         5         4

If you wanted to aggregate with something other than sum, replace the call to colSums with something like function(cols) lapply(cols, f), where f is the aggregating function, e.g., max.

于 2013-03-28T15:50:39.357 回答