2

我有以下格式的数据:

structure(list(cat = structure(c(1L, 2L, 3L, 1L, 2L, 2L, 3L, 
3L, 3L, 3L, 1L, 2L), .Label = c("A", "B", "C"), class = "factor"), 
ID = structure(c(1L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 2L, 
3L, 4L), .Label = c("s1", "s10", "s11", "s12", "s2", "s3", 
"s4", "s5", "s6", "s7", "s8", "s9"), class = "factor"), val = c(150, 
750, 950, 104, 726, 797, 890, 912, 994, 1004, 199, 704), 
LWR = c(100, 700, 900, NA, NA, NA, NA, NA, NA, NA, NA, NA
), UPP = c(200, 800, 1000, NA, NA, NA, NA, NA, NA, NA, NA, 
NA)), .Names = c("cat", "ID", "val", "LWR", "UPP"), row.names = c(NA, 
-12L), class = "data.frame")

看起来像:

    cat ID  val LWR  UPP
1    A  s1  150 100  200
2    B  s2  750 700  800
3    C  s3  950 900 1000
4    A  s4  104  NA   NA
5    B  s5  726  NA   NA
6    B  s6  797  NA   NA
7    C  s7  890  NA   NA
8    C  s8  912  NA   NA
9    C  s9  994  NA   NA
10   C s10 1004  NA   NA
11   A s11  199  NA   NA
12   B s12  704  NA   NA

我想要做的是在 val 列中找到一个值,该值具有最接近 LWR 或 UPP 值的同一 cat。通过查看所需的输出可能最容易理解:

  cat id val LWR  UPP  LS NLWR  US NUPP
1   A s1 150 100  200  s4  104 s11  199
2   B s2 750 700  800 s12  704  s6  797
3   C s3 950 900 1000  s8  912  s9  994

新列(LS 和 NLWR/US 和 NUPP)与提取行中的 id 和 val 相同,只是给出了新的列名。我尝试使用各种形式的“which”来运行它,然后重新调整数据,但没有任何运气。有没有直接的方法可以做到这一点,还是总是需要多个步骤?

4

1 回答 1

1
DF1 <- na.omit(DF)
DF2 <- DF[is.na(DF$LWR),]

library(plyr)

ddply(DF1,.(cat),function(df) {
  lwr <- which.min(abs(DF2$val-df$LWR))
  upp <- which.min(abs(DF2$val-df$UPP))

  df$LS <- DF2[lwr,"ID"]
  df$NLWR <- DF2[lwr,"val"]
  df$US <- DF2[upp,"ID"]
  df$NUPP <- DF2[upp,"val"]

  df
})

#   cat ID val LWR  UPP  LS NLWR  US NUPP
# 1   A s1 150 100  200  s4  104 s11  199
# 2   B s2 750 700  800 s12  704  s6  797
# 3   C s3 950 900 1000  s7  890 s10 1004

请注意,890 比 912 更接近 900,NUPP 也是如此。如果值必须介于LWR和之间,应该很容易调整UPP

于 2013-05-26T17:50:08.500 回答