4

随着时间的推移,我有一个按地区计数的数据框。数据框的一行包含每列的计数总数。我想通过将每个列单元格除以相应列的总数来将数据框从计数转换为比例。某些列包含缺失的观察值。我在下面使用嵌套完成了此操作,for-loops但怀疑可能有更简单的方法,也许使用lapply. 我在提取计数总数行时也遇到了麻烦。

我发布这个部分是因为我是时候学习使用 apply 系列函数了,我怀疑它们在这里可能有用,部分是因为我在创建计数总数的向量时遇到了很多麻烦,并且怀疑使用[[会有所帮助。感谢您提供有关更有效地编写上述代码的任何建议。

my.data = read.table(text = "
state    y1970  y1980  y1990  y2000
Alaska       4      6     NA      7
Iowa        10     20     30     40
Nevada     100    100    100    100
Ohio        50     60     NA     80
total      172    195    215    238
Wyoming      8      9     10     11
", sep = "", header = TRUE)

desired.result = read.table(text = "
state         y1970       y1980       y1990       y2000
Alaska   0.02325581  0.03076923          NA  0.02941176  
Iowa     0.05813953  0.10256410  0.13953488  0.16806723  
Nevada   0.58139535  0.51282051  0.46511628  0.42016807  
Ohio     0.29069767  0.30769231          NA  0.33613445  
total    1.00000000  1.00000000  1.00000000  1.00000000  
Wyoming  0.04651163  0.04615385  0.04651163  0.04621849  
", sep = "", header = TRUE)

state  <- as.vector(unlist(my.data[, 1]))

my.totals <- as.vector(unlist(my.data[ my.data$state=='total', 2:5]))

proportions <- matrix(NA, nrow=nrow(my.data), ncol=ncol(my.data))
proportions <- as.data.frame(proportions)

for(i in 1:nrow(my.data)) {
 for(j in 1:ncol(my.data)) {

  if(j==1) proportions[i,1] <- state[i] 
  if(j> 1) proportions[i,j] <- my.data[i,j] / my.totals[j-1]

 }
}

colnames(proportions) <- names(my.data)
proportions


#     state      y1970      y1980      y1990      y2000
# 1  Alaska 0.02325581 0.03076923         NA 0.02941176
# 2    Iowa 0.05813953 0.10256410 0.13953488 0.16806723
# 3  Nevada 0.58139535 0.51282051 0.46511628 0.42016807
# 4    Ohio 0.29069767 0.30769231         NA 0.33613445
# 5   total 1.00000000 1.00000000 1.00000000 1.00000000
# 6 Wyoming 0.04651163 0.04615385 0.04651163 0.04621849
4

2 回答 2

4

大概是这样的:

df[, -1] <- lapply( df[ , -1], function(x) x/sum(x, na.rm=TRUE) )

如果它是一个矩阵,你可以使用prop.table(mat). 但是,在这种情况下,您需要限制仅处理数字列(通过排除第一个列)。

此外,我认为您需要排除“总计”行:

 my.data[-5, -1] <- lapply( my.data[ -5 , -1], function(x){ x/sum(x, na.rm=TRUE)} )
 my.data[ -5 , ]
    state      y1970      y1980      y1990      y2000
1  Alaska 0.02325581 0.03076923         NA 0.02941176
2    Iowa 0.05813953 0.10256410 0.21428571 0.16806723
3  Nevada 0.58139535 0.51282051 0.71428571 0.42016807
4    Ohio 0.29069767 0.30769231         NA 0.33613445
6 Wyoming 0.04651163 0.04615385 0.07142857 0.04621849

-------------

替代方法:

> my.data[,-1] <-lapply( my.data[  , -1], function(x){ x/x[5] } )
> my.data
    state      y1970      y1980      y1990      y2000
1  Alaska 0.02325581 0.03076923         NA 0.02941176
2    Iowa 0.05813953 0.10256410 0.13953488 0.16806723
3  Nevada 0.58139535 0.51282051 0.46511628 0.42016807
4    Ohio 0.29069767 0.30769231         NA 0.33613445
5   total 1.00000000 1.00000000 1.00000000 1.00000000
6 Wyoming 0.04651163 0.04615385 0.04651163 0.04621849

这显示了当在两个边距上使用时 prop.table 将返回缺失值,然后分别在行和列上使用一个非常简单的矩阵:

> prop.table( matrix( c( 1,2,NA, 3),2) )
     [,1] [,2]
[1,]   NA   NA
[2,]   NA   NA
> prop.table( matrix( c( 1,2,NA, 3),2), 1 )
     [,1] [,2]
[1,]   NA   NA
[2,]  0.4  0.6
> prop.table( matrix( c( 1,2,NA, 3),2), 2 )
          [,1] [,2]
[1,] 0.3333333   NA
[2,] 0.6666667   NA
于 2012-11-21T23:16:10.050 回答
0

或者,您可以:

library(tidyverse)

my.data = read.table(text = "
state    y1970  y1980  y1990  y2000
Alaska       4      6     NA      7
Iowa        10     20     30     40
Nevada     100    100    100    100
Ohio        50     60     NA     80
total      172    195    215    238
Wyoming      8      9     10     11
", sep = "", header = TRUE)

my.data %>% 
  # Convert table into long format
  pivot_longer(cols = -state, names_to = "year") %>% 
  # (Optional) Convert year to numeric:
  mutate(year = as.numeric(gsub("^y", "", year))) %>%  
  # Convert data frame to a table
  xtabs(formula = value ~ state + year) %>% 
  # Calculate proportions: 
  prop.table
#>          year
#> state            1970        1980        1990        2000
#>   Alaska  0.002555911 0.003833866 0.000000000 0.004472843
#>   Iowa    0.006389776 0.012779553 0.019169329 0.025559105
#>   Nevada  0.063897764 0.063897764 0.063897764 0.063897764
#>   Ohio    0.031948882 0.038338658 0.000000000 0.051118211
#>   total   0.109904153 0.124600639 0.137380192 0.152076677
#>   Wyoming 0.005111821 0.005750799 0.006389776 0.007028754
于 2021-10-13T20:39:12.497 回答