-2

我是 data.table 的新手,并试图找出我的表中有多少行在两列中具有相同的值。结果表有多行包含相同的组合键。有人可以帮我解决我做错的事情吗?

labs_raw_df <- data.table(labs_raw)
setkey(labs_raw_df, NAT, LAB_TST_AN_LAB_TST_CD)
lab_pt_count <- labs_raw_df[,
list(n=.N)
  ,by=list(NAT, LAB_TST_AN_LAB_TST_CD)]

两列都是字符。

4

1 回答 1

1

Writing an answer since this is too long for a comment.

I assume that you use data.table 1.8.6.

Let's create some dummy data:

set.seed(42)
labs_raw_df <- data.frame(NAT=sample(c("A","B","C"),20,TRUE),
                          LAB_TST_AN_LAB_TST_CD=sample(c("A","B","C"),20,TRUE),
                          value=sample(0:1,20,TRUE))

Now your code (with some minor corrections of naming):

library(data.table)
labs_raw_dt <- data.table(labs_raw_df)
setkey(labs_raw_dt, NAT, LAB_TST_AN_LAB_TST_CD)
lab_pt_count <- labs_raw_dt[,
                            list(n=.N),
                            by=list(NAT, LAB_TST_AN_LAB_TST_CD)]
print(lab_pt_count)

   NAT LAB_TST_AN_LAB_TST_CD n
1:   A                     A 1
2:   A                     C 3
3:   B                     A 2
4:   B                     B 3
5:   B                     C 2
6:   C                     A 2
7:   C                     B 2
8:   C                     C 5

This is the expected result. Can you elaborate on how that doesn't meet your expectation?

Of course we can simplify a bit:

lab_pt_count <- labs_raw_dt[,
                            .N,
                            by=key(labs_raw_dt)]
print(lab_pt_count)

   NAT LAB_TST_AN_LAB_TST_CD N
1:   A                     A 1
2:   A                     C 3
3:   B                     A 2
4:   B                     B 3
5:   B                     C 2
6:   C                     A 2
7:   C                     B 2
8:   C                     C 5

But the result is the same.

于 2013-01-06T10:27:49.810 回答