5

如果行中的每个值都大于不同数据框中的相应行,我想对数据进行子集化。我还需要跳过一些顶行。这些以前的问题对我没有帮助,但它是相关的:

根据另一个数据框的内容对数据框进行子集化

使用来自不同数据框的信息的子集数据 [r]

> A
     name1 name2
cond   trt  ctrl
hour     0     3
A        1     1
B       10     1
C        1     1
D        1     1
E       10    10
> B
     name1 name2
cond   trt  ctrl
hour     0     3
A        1     1
B        1    10
C        1     1
D        1     1
E        1     1

我要这个。只有 A 中的所有值都大于 B 的行:

     name1 name2
cond   trt  ctrl
hour     0     3
E       10    10

我试过这 3 行:

subset(A, TRUE, select=(A[3:7,] > B[3:7,]))
subset(A, A > B)
A[A[3:7,] > B[3:7,]]

非常感谢。下面是生成数据的代码:

A <- structure(list(name1 = c("trt", "0", "1", "10", "1", "1", "10"
), name2 = c("ctrl", "3", "1", "1", "1", "1", "10")), .Names = c("name1", 
"name2"), row.names = c("cond", "hour", "A", "B", "C", "D", "E"
), class = "data.frame")
B <- structure(list(name1 = c("trt", "0", "1", "1", "1", "1", "1"), 
    name2 = c("ctrl", "3", "1", "10", "1", "1", "1")), .Names = c("name1", 
"name2"), row.names = c("cond", "hour", "A", "B", "C", "D", "E"
), class = "data.frame")
############# 2/28/13 提出后续问题

基于R中不同数据框的调整值进行子集时出错

4

4 回答 4

5
N <- nrow(A)
cond <- sapply(3:N, function(i) sum(A[i,] > B[i,])==2)
rbind(A[1:2,], subset(A[3:N,], cond))
于 2013-02-26T19:32:21.473 回答
3

我认为最好使用 SQL 进行这种表间过滤。它干净易读(您保留规则逻辑)。

 library(sqldf)
sqldf('SELECT DISTINCT A.*
        FROM A,B
        WHERE A.name1   > B.name1
        AND    A.name2  > B.name2')
  name1 name2
1   trt  ctrl
2    10    10
于 2013-02-26T19:50:49.750 回答
3

必要的data.table解决方案:

library(data.table)

# just to preserve the order, non-alphabetically
idsA <- factor(rownames(A), levels=rownames(A))
idsB <- factor(rownames(B), levels=rownames(B))

# convert to data.table with id
ADT <- data.table(id=idsA, A, key="id")
BDT <- data.table(id=idsB, B, key="id")

# filter as needed
ADT[BDT][name1 > name1.1 & name2 > name2.1, list(id, name1, name2)]
于 2013-02-26T19:56:23.257 回答
2

If I rename your matrices amat and bmat, then

amat[which(sapply(1:nrows(amat),function(x) prod(amat[x,]>bmat[x,]))==1),]
[1] 10 10

And you can paste the 'hours' row back on if desired.

于 2013-02-26T19:41:33.670 回答