4

好的,所以我有一个表格,对于拳头几行看起来有点像这样:

部门 诊断代码
1部 代码1
2部 代码2
3部 代码3
3部 代码3
3部 代码4
4部 代码4
4部 代码4
4部 代码5
4部 代码5
4部 代码5

我想要的是开发一个看起来像这样的表:

部门 代码1% 代码2% 代码3% 代码4% 代码5%
1部 xx% xx% xx% xx% xx%

其中上述百分比是每个部门每个代码的百分比,即代码1在部门1内出现的总次数除以部门1内“代码实例”的出现总数。所以如果代码1出现50次在部门 1 中,部门 1 在所有代码中记录了 120 个部门代码实例,百分比应该是 50/120。

我正在尝试使用 group_by()、mutate() 和 summarise() 的某种组合来完成工作,但我无法弄清楚如何正确组合和编写代码以获得我想要的输出。

当第二列是某种数字频率类型时,我已经看到很多示例代码显示类似的内容,但是当第二列由对应于离散类别的字符串组成时,我还没有找到相同的内容。

**编辑:此外,代码是字母数字。例如,一个代码可能类似于 E77.09,而另一个代码可能类似于 C30,另一个代码可能是 D24.3

4

3 回答 3

2
df<- data.frame(
  stringsAsFactors = FALSE,
             Department = c("Dept. 1","Dept. 2",
                            "Dept. 3","Dept. 3","Dept. 3","Dept. 4",
                            "Dept. 4","Dept. 4","Dept. 4","Dept. 4"),
         Diagnosis.Code = c("Code1","Code2",
                            "Code3","Code3","Code4","Code4","Code4","Code5",
                            "Code5","Code5")
     )

library(dplyr, warn.conflicts = FALSE)
library(janitor)


df %>% tabyl(Department, Diagnosis.Code) %>%
  adorn_percentages() %>%
  adorn_pct_formatting(2)

#>  Department   Code1   Code2  Code3  Code4  Code5
#>     Dept. 1 100.00%   0.00%  0.00%  0.00%  0.00%
#>     Dept. 2   0.00% 100.00%  0.00%  0.00%  0.00%
#>     Dept. 3   0.00%   0.00% 66.67% 33.33%  0.00%
#>     Dept. 4   0.00%   0.00%  0.00% 40.00% 60.00%

reprex 包于 2021-07-21 创建 (v2.0.0 )

于 2021-07-21T15:26:32.283 回答
2

1 最简单的方法是基数 R。

tbl <- table(df1[[1]], df1[[2]])
100*tbl/rowSums(tbl)
#              Code1     Code2     Code3     Code4     Code5
#  Dept. 1 100.00000   0.00000   0.00000   0.00000   0.00000
#  Dept. 2   0.00000 100.00000   0.00000   0.00000   0.00000
#  Dept. 3   0.00000   0.00000  66.66667  33.33333   0.00000
#  Dept. 4   0.00000   0.00000   0.00000  40.00000  60.00000

还有一个。

xtb <- xtabs(~ Department + Code, df1)
100*xtb/rowSums(xtb)

2 以下是使用dplyr和的解决方案tidyr

library(dplyr)
library(tidyr)

df1 %>%
  group_by(Department) %>%
  mutate(d = n()) %>%
  group_by(Department, Code) %>%
  summarise(Perc = n()/first(d), .groups = "drop") %>%
  pivot_wider(
    id_cols = Department,
    names_from = Code,
    values_from = Perc
  )
## A tibble: 4 x 6
#  Department Code1 Code2  Code3  Code4 Code5
#  <chr>      <dbl> <dbl>  <dbl>  <dbl> <dbl>
#1 Dept. 1        1    NA NA     NA      NA  
#2 Dept. 2       NA     1 NA     NA      NA  
#3 Dept. 3       NA    NA  0.667  0.333  NA  
#4 Dept. 4       NA    NA NA      0.4     0.6

要在有NA's 的地方获得带有零的百分比值,只需进行简单的更改即可。

df1 %>%
  group_by(Department) %>%
  mutate(d = n()) %>%
  group_by(Department, Code) %>%
  summarise(Perc = 100 * n()/first(d), .groups = "drop") %>%
  pivot_wider(
    id_cols = Department,
    names_from = Code,
    values_from = Perc,
    values_fill = 0
  )

数据

df1 <-
structure(list(Department = c("Dept. 1", "Dept. 2", "Dept. 3", 
"Dept. 3", "Dept. 3", "Dept. 4", "Dept. 4", "Dept. 4", "Dept. 4", 
"Dept. 4"), Code = c("Code1", "Code2", "Code3", "Code3", "Code4", 
"Code4", "Code4", "Code5", "Code5", "Code5")), row.names = c(NA, 
-10L), class = "data.frame")
于 2021-07-21T15:29:42.173 回答
1

如果您想要一个简单的基本 R 解决方案:

tab = table(dat[,1], dat[,2])
tab / rowSums(tab) * 100
              Code1     Code2     Code3     Code4     Code5
  Dept. 1 100.00000   0.00000   0.00000   0.00000   0.00000
  Dept. 2   0.00000 100.00000   0.00000   0.00000   0.00000
  Dept. 3   0.00000   0.00000  66.66667  33.33333   0.00000
  Dept. 4   0.00000   0.00000   0.00000  40.00000  60.00000
于 2021-07-21T15:22:41.887 回答