r - 根据数据框中其他列的测试将 NA 设置为列

Question

我有一个包含 48 列的大型数据框，我想在数据框的每一行上运行一个函数，从而将通过函数给出的测试的列设置为 NA。该测试涉及从另一个数据帧中获取一个数字。adply 很适合这个，但是我很难让它给我想要的结果。

让我解释一下：

这是我要操作的数据框的示例：

 >df
  pt depth Cell1_avgvel Cell1_avgdir Cell2_avgvel Cell2_avgdir
1  1   0.1           NA           NA           NA           NA
2  2   0.2           NA           NA        1.344        324.0
3  3   0.3           NA           NA        0.445        167.0
4  4   0.4        1.455        354.2        0.322        321.2

这是从中派生测试的小数据框：

> tcell
  depth  name
1   0.2 Cell1
2   0.4 Cell2
3   0.6 Cell3
4   0.8 Cell4

整个想法是将NA分配给比大数据框中列出的实际深度更深的Cells数据点（即在第3行，深度为0.3但有两个数据点对应于Cell2，位于0.4 m 深度，因此这些都是错误的。我想 NA 这些）。

我想编写一个函数，它一次接收一行，并且：1）获取仪器深度 2）获取列名列表 3）获取比仪器深度更深的单元格索引 4）获取名称这些单元格（即 Cell1、Cell2、Cell4 等）5）使用正则表达式在列名列表中查找具有相应单元格的列（即 Cell1_avgdir、Cell1_avgvel 等）6）使用这些索引，设置那些列值到 NA。

这是我到目前为止所拥有的：

depthNA = function(x) {
  depth = x$depth
  nms = names(df)
  ind = as.character(which(depth < tcell$depth))
  c = tcell$name[ind]
  patt = paste(c,collapse="|")
  c_ind = grep(patt,nms)
  x[,c_ind] <- NA
}

adply(df,1,depthNA)

不幸的是，这并没有像我想象的那样做，我现在正试图找出原因。

它给了我这个：

  pt depth Cell1_avgvel Cell1_avgdir Cell2_avgvel Cell2_avgdir V1
1  1   0.1           NA           NA           NA           NA NA
2  2   0.2           NA           NA        1.344        324.0 NA
3  3   0.3           NA           NA        0.445        167.0 NA
4  4   0.4        1.455        354.2        0.322        321.2 NA

当我想要的是：

  pt depth Cell1_avgvel Cell1_avgdir Cell2_avgvel Cell2_avgdir
1  1   0.1           NA           NA           NA           NA
2  2   0.2           NA           NA           NA           NA
3  3   0.3           NA           NA           NA           NA
4  4   0.4        1.455        354.2        0.322        321.2

希望我已经充分解释了我的问题。感谢任何可以：1）修复我已经开始的东西，或者 2）告诉我一个更好的方法来做到这一点，我不知道。

-SH

score 1 · Accepted Answer

以下是回答您的想法概述但与您的输出不匹配的答案。请参阅我上面关于输出是否正确的评论。答案取决于reshape2使加入更容易。

首先，我读了你的数据：

df <- read.table(text = "  pt depth Cell1_avgvel Cell1_avgdir Cell2_avgvel Cell2_avgdir
1  1   0.1           NA           NA           NA           NA
2  2   0.2           NA           NA        1.344        324.0
3  3   0.3           NA           NA        0.445        167.0
4  4   0.4        1.455        354.2        0.322        321.2", header = TRUE)

tcell <- read.table(text = " depth  name
1   0.2 Cell1
2   0.4 Cell2
3   0.6 Cell3
4   0.8 Cell4", header = TRUE)

然后解决你的问题：

library(reshape2)

#Melt into long format
df.m <- melt(df, id.vars = 1:2)
#Split the column into two new columns based on _
df.m[, c("Cell", "OtherCol")] <- with(df.m, colsplit(variable, "_", c("Cell", "OtherCol")))
#Merge together with tcell
df.m <- merge(df.m, tcell, by.x = "Cell", by.y = "name")
#Add a new column which sets the offending values to NA
df.m <- transform(df.m, newvalue = ifelse(value > depth.y, NA, value))
#Cast back into wide format
dcast(pt + depth.x ~ variable, value.var = "newvalue", data = df.m)

  pt depth.x Cell1_avgvel Cell1_avgdir Cell2_avgvel Cell2_avgdir
1  1     0.1           NA           NA           NA           NA
2  2     0.2           NA           NA           NA           NA
3  3     0.3           NA           NA           NA           NA
4  4     0.4           NA           NA        0.322           NA

r - 根据数据框中其他列的测试将 NA 设置为列

1 回答 1

Related

Reference