r - 在R中按数字顺序将一列匹配到另一列

Question

我有一个数据文件：

https://dl.dropbox.com/u/22681355/example.csv

读取文件：

example<-read.csv("example.csv")
example<-example[,-1]

example[,1] 包含按数字顺序递增的数字列表。example[,2] 包含另一组数字

首先，我想确定 example[,2] 中未在 example[,1] 中列出的数字

diff<-setdiff(example[,2],example[,1])

现在我知道了这些值，我想将它们插入到 example[,1] 中，而 example[,1] 和 example[,2] 中的现有值保持不变。

一个简短的例子是：

Example[,1]   Example[,2]
1             1000
1             50
1             3
1             90
1             25
3             4
5             2
5             7
etc           etc

运行 setdiff() 后，我得到的数字不在第一列，而是在第二列。

现在我想将它们放在 example[,1] 中以产生以下输出：

Example[,1]   Example[,2]
1             1000
1             50
1             3
1             90
1             25
2             NA
3             4
4             NA
5             2
5             7
etc           etc

所以基本上把它们按数字顺序排列，但其他一切都完好无损。

第 1 部分由Joris Meys 完美解决！

我还有两个问题：

///////////////////////////////////////// /////////////////////////////////

1：

如果有额外的第三列但我不想对它做任何事情，是否可以这样做？

例如：

原来的

 Example[,1]   Example[,2] Example[,3]
 1             1000        37
 1             50          18
 1             3           54
 1             90          72
 1             25          23
 3             4           15
 5             2           20
 5             7           9
 etc           etc

所需的输出：

Example[,1]   Example[,2]  Example[,3]
1             1000         37
1             50           18
1             3            54
1             90           72
1             25           23
2             NA           NA
3             4            15
4             NA           NA
5             2            20
5             7            19
etc           etc

2：

而不是在 example[,2] 中添加 NA 到 example[,1] 没有来自 example[,2] 的值的情况，例如 example[,1] 没有数字 '30' 然后我想搜索example[,2] 是否有数字'30'并查看 example[,1] 在该行中有什么值，然后将其添加到 example[,2] 而不是 NA。

例如：

Example[,1]   Example[,2]  Example[,3]
1             1000         37
1             50           18
1             3            54
1             90           72
1             25           23
2             NA           NA
3             4            15
4             NA           NA
5             2            20
5             7            19
etc           etc

而不是 NA 有：

Example[,1]   Example[,2]  Example[,3]
1             1000         37
1             50           18
1             3            54
1             90           72
1             25           23
2            5            20
3             4            15
4            3           15
5             2            20
5             7            19
etc           etc

score 3 · Accepted Answer

所以，在你明确你想要什么之后，这意味着你有一个矩阵

Example <- 
matrix(
  c(1,1,1,1,1,3,5,5,1000,50,3,90,25,4,2,7),
  ncol=2
)

然后您可以执行以下操作：

diffs <- setdiff(Example[,2],Example[,1])
tmps <- rbind(Example,
              matrix(
                 c(diffs,rep(NA,length(diffs))),
                 ncol=2
              )
        )
solution <- tmps[order(tmps[,1]),]

这将为您提供以下结果：

> solution
      [,1] [,2]
 [1,]    1 1000
 [2,]    1   50
 [3,]    1    3
 [4,]    1   90
 [5,]    1   25
 [6,]    2   NA
 [7,]    3    4
 [8,]    4   NA
 [9,]    5    2
[10,]    5    7
[11,]    7   NA
...

请参阅帮助文件?matrix和?order.

score 1 · Accepted Answer

如果您的矩阵有两列以上，则以下方法也适用。这是 Joris Meys 解决方案的延伸。

Example <- matrix(c(1,1,1,1,1,3,5,5,
                    1000,50,3,90,25,4,2,7,37,18,54,72,23,15,20,9),ncol=3)


diffs <- setdiff(Example[,2], Example[,1])
new_mat <- rbind(Example,
                 matrix(c(diffs,
                          rep(NA, length(diffs) * (ncol(Example) - 1))), 
                        ncol = ncol(Example)))
solution <- new_mat[order(new_mat[,1]),]

结果：

      [,1] [,2] [,3]
 [1,]    1 1000   37
 [2,]    1   50   18
 [3,]    1    3   54
 [4,]    1   90   72
 [5,]    1   25   23
 [6,]    2   NA   NA
 [7,]    3    4   15
 [8,]    4   NA   NA
 [9,]    5    2   20
[10,]    5    7    9
[11,]    7   NA   NA
[12,]   25   NA   NA
[13,]   50   NA   NA
[14,]   90   NA   NA
[15,] 1000   NA   NA

一旦你创建了这个矩阵，就很容易生成一个没有 NA 的新矩阵：

solution2 <- solution
solution2[is.na(solution2)] <- Example[match(sort(diffs), Example[,2]), -2]

结果：

      [,1] [,2] [,3]
 [1,]    1 1000   37
 [2,]    1   50   18
 [3,]    1    3   54
 [4,]    1   90   72
 [5,]    1   25   23
 [6,]    2    5   20
 [7,]    3    4   15
 [8,]    4    3   15
 [9,]    5    2   20
[10,]    5    7    9
[11,]    7    5    9
[12,]   25    1   23
[13,]   50    1   18
[14,]   90    1   72
[15,] 1000    1   37

r - 在R中按数字顺序将一列匹配到另一列

2 回答 2

Related

Reference