r - counting frequency of incorrect value in r

Question

Here is my example dataset

 set.seed(123)
 myd <- data.frame (sub = paste ("S", 1:10, sep = ""), P1 = sample(c(1,-1,2,0), 10, replace = TRUE),
                    P2 = sample(c(1,-1,2,0), 10, replace = TRUE),
                    I1 = sample(c(1,-1,2,0), 10, replace = TRUE),
                    I2 = sample(c(1,-1,2,0), 10, replace = TRUE),
                    I3 = sample(c(1,-1,2,0), 10, replace = TRUE),
                    I4 = sample(c(1,-1,2,0), 10, replace = TRUE),
                    I5 = sample(c(1,-1,2,0), 10, replace = TRUE),
                    I6 = sample(c(1,-1,2,0), 10, replace = TRUE)
                    )
 myd 

  sub P1 P2 I1 I2 I3 I4 I5 I6
1   S1 -1  0  0  0  1  1  2  0
2   S2  0 -1  2  0 -1 -1  1  2
3   S3 -1  2  2  2 -1  0 -1  2
4   S4  0  2  0  0 -1  1 -1  1
5   S5  0  1  2  1  1  2  0 -1
6   S6  1  0  2 -1  1  1 -1  1
7   S7  2  1  2  0  1  1  0 -1
8   S8  0  1  2  1 -1  0  0  2
9   S9  2 -1 -1 -1 -1  0  0 -1
10 S10 -1  0  1  1  0 -1 -1  1

Translation table for incorrect values conditioned on values P1 and P2: -1 is missing value

  Condition   P1    P2         The value Incorrect
    I         1     1           None
    II        1     0           2
    III       0     1           2
     IV       2     0           2 or 0
      V       0     2          2 or 0
      VI      2     2          1 or 0
      VII     1     2          0
     VIII     2     1          0

 # if there is -1 in any of the value produce all values NA
      IX      -1      0           NA
      X        0     -1           NA
      XI      -1     -1           NA
      XII      -1     2           NA
       XIII     2    -1           NA
      XIV      -1     1           NA
      XV        1     -1           NA

The following is short code for transition table in data.frame format except** for IV, V, VI conditions where I did not know how to enter as there are two values:

 ttable <- data.frame (P1 = c(1,1,0,2,0,2,1,2,-1, 0,-1,-1,2,-1,1), 
                     P2 = c(1,0,1,0,2,2,2,1,0,-1,-1,2,-1,1,1), 
                   errort = c("None", 2,2,2, 2,1,0,0,NA, NA, NA, NA, NA, NA,NA))

What I am trying to look at for each s1 to s10 rows, I would like to check values in P1 and P2 column and match this with the values in I1 to I6 column:

   sub   P1 P2 I1 I2 I3 I4 I5 I6
1   S1   -1  0  0  0  1  1  2  0

In this case P1 and P2 one of value is -1 so all values will be NA.

Another case:

          sub   P1 P2  I1  I2  I3  I4   I5  I6
           S4   0  2   0   0  -1   1   -1   1

Here P1 = 0, P2 = 2, so the following values I1 = Incorrect, I2 = Incorrect, I3 = NA, I4 = correct, I5 = NA, I6 = correct

May be written as

sub   P1 P2  I1      I2     I3   I4     I5   I6
 S4   0  2   0      0      -1    1     -1    1

            FALSE, FALSE,  NA,  TRUE, NA,  TRUE

This match with condition (V) and either 0 or 1 are incorrect while 1 is correct and -1 is missing

Another case: here P1 = 0 and P2 =1, match with condition (III) in match table, thus incorrect values would be 2.

 5   S5  0  1   2      1     1     2      0      -1
               FALSE, TRUE,  TRUE  FALSE  TRUE    NA

I need to calculated frequency of false, I tried a lot of if-else statements but not giving desired output, I feel messey with many of these and I do not think this efficient for a large dataset I will be using.

qcfun <- function (x) {
x <- x[3:length(x)]
obs1 =   table(c(x, 2, 0, 1, -1))
obs = obs1-1
ov <- NULL
if (x[1] == 1 & x[2] == 0){
ov = round (as.numeric (obs[4]/sum(obs)), 2)
} else {
if (x[1] == 0 & x[2] == 1){
ov = round (as.numeric (obs[4]/sum(obs)), 2)
} else {
if (x[1] == 1 & x[2] == 2){
ov = round (as.numeric (obs[2]/sum(obs)), 2)
} else {
if (x[1] == 2 & x[2] == 1){
ov = round (as.numeric (obs[2]/sum(obs)), 2)
} else {
if (x[1] == 1 & x[2] == 1){
ov = 0
} else {
ov = NA
}
}}}}
return (ov)
}
out1 <- apply(myd, 1,qcfun )
table (out1)
tout1 <- table (out1)

Is there a quick / efficient way of doing this?

score 2 · Accepted Answer

您可以使用这个矢量化函数，它对于大量行将是有效的：

fixI <- function(p1, p2, i){
    negative <- (p1 < 0) | (p2 < 0) | (i < 0)
    result <- ifelse(negative, NA, TRUE)  # conditions IX to XV

    p <- p1 * 10 + p2

    result[!negative & p %in% c(10,1,20,2) & i==2] <- FALSE
    result[!negative & p %in% c(20,2,22,12,21) & i==0] <- FALSE
    result[!negative & p==22 & i==1] <- FALSE

    result
}

将其应用于以I下列myd：

mat <- sapply(myd[,paste0("I",1:6)], fixI, p1=myd$P1, p2=myd$P2)

rownames(mat) <- myd$sub

结果：

       I1    I2   I3    I4    I5    I6
S1     NA    NA   NA    NA    NA    NA
S2     NA    NA   NA    NA    NA    NA
S3     NA    NA   NA    NA    NA    NA
S4  FALSE FALSE   NA  TRUE    NA  TRUE
S5  FALSE  TRUE TRUE FALSE  TRUE    NA
S6  FALSE    NA TRUE  TRUE    NA  TRUE
S7   TRUE FALSE TRUE  TRUE FALSE    NA
S8  FALSE  TRUE   NA  TRUE  TRUE FALSE
S9     NA    NA   NA    NA    NA    NA
S10    NA    NA   NA    NA    NA    NA

现在你可以像这样计数FALSE：

按行：

apply(!mat, 1, sum, na.rm=TRUE)

 S1  S2  S3  S4  S5  S6  S7  S8  S9 S10 
  0   0   0   2   2   1   2   2   0   0

按列：

apply(!mat, 2, sum, na.rm=TRUE)

 I1 I2 I3 I4 I5 I6 
  4  2  0  1  1  1

r - counting frequency of incorrect value in r

1 回答 1

Related

Reference