-1

I have two big files with several columns and lines. Both files contain the column TAG which is in one file has no duplicates and in the other file does have duplicates.

See like this:

FILE1:

stat    stat    P-value tag
0.3049  7.464   1.875e-11   L2_None_chr1_-_109092036
0.2961  7.448   2.105e-11   L2_None_chr1_-_109092036
0.2934  7.347   3.389e-11   L2_None_chr1_-_109092036
0.2961  7.245   5.668e-11   L2_None_chr1_-_109092036
0.6682  7.284   4.664e-11   L2_None_chr1_-_109957962
0.6682  7.284   4.664e-11   L2_None_chr1_-_109957962
0.3933  7.363   3.127e-11   L2_None_chr1_-_159842839
0.3808  7.284   4.672e-11   L2_None_chr1_-_159842839
0.2993  7.17    8.278e-11   L2_None_chr1_-_169972458
0.3312  7.817   3.075e-12   L2_None_chr1_-_203626998
0.3312  7.817   3.075e-12   L2_None_chr1_-_203626998
0.614   7.616   9.742e-12   L2_None_chr1_-_569826
0.6411  7.58    1.037e-11   L2_None_chr1_-_569826
0.5755  7.275   4.871e-11   L2_None_chr1_-_569826
0.6893  7.26    5.255e-11   L2_None_chr1_-_6546011
0.3136  7.529   1.35e-11    L2_None_chr1_-_91180355
0.3262  7.449   2.023e-11   L2_None_chr1_-_91180355
0.298   7.151   9.129e-11   L2_None_chr1_-_91180355
0.2999  7.149   9.201e-11   L2_None_chr1_-_91182695
0.5383  7.189   7.534e-11   L2_None_chr1_-_91183491

FILE2:

L2_None_chr1_-_109092036    chr1    109092034
L2_None_chr1_-_109957962    chr1    109957879
L2_None_chr1_-_159842839    chr1    159842779
L2_None_chr1_-_169972458    chr1    169972444
L2_None_chr1_-_203626998    chr1    203626983
L2_None_chr1_-_569826   chr1    569802
L2_None_chr1_-_6546011  chr1    6545930
L2_None_chr1_-_91180355 chr1    91180310
L2_None_chr1_-_91182695 chr1    91182572
L2_None_chr1_-_91183491 chr1    91183389

What I want;

stat    P-value tag tag chr bp
7.464   1.875e-11   L2_None_chr1_-_109092036    L2_None_chr1_-_109092036    1   109092036
7.448   2.105e-11   L2_None_chr1_-_109092036    L2_None_chr1_-_109092036    1   109092036
7.347   3.389e-11   L2_None_chr1_-_109092036    L2_None_chr1_-_109092036    1   109092036
7.245   5.668e-11   L2_None_chr1_-_109092036    L2_None_chr1_-_109092036    1   109092036
7.284   4.664e-11   L2_None_chr1_-_109957962    L2_None_chr1_-_109957962    1   109957962
7.284   4.664e-11   L2_None_chr1_-_109957962    L2_None_chr1_-_109957962    1   109957962
7.363   3.127e-11   L2_None_chr1_-_159842839    L2_None_chr1_-_159842839    1   159842839
7.284   4.672e-11   L2_None_chr1_-_159842839    L2_None_chr1_-_159842839    1   159842839
7.17    8.278e-11   L2_None_chr1_-_169972458    L2_None_chr1_-_169972458    1   169972458
7.817   3.075e-12   L2_None_chr1_-_203626998    L2_None_chr1_-_203626998    1   203626998
7.817   3.075e-12   L2_None_chr1_-_203626998    L2_None_chr1_-_203626998    1   203626998
7.616   9.742e-12   L2_None_chr1_-_569826   L2_None_chr1_-_569826   1   569826
7.58    1.037e-11   L2_None_chr1_-_569826   L2_None_chr1_-_569826   1   569826
7.275   4.871e-11   L2_None_chr1_-_569826   L2_None_chr1_-_569826   1   569826
7.26    5.255e-11   L2_None_chr1_-_6546011  L2_None_chr1_-_6546011  1   6546011
7.529   1.35e-11    L2_None_chr1_-_91180355 L2_None_chr1_-_91180355 1   91180355
7.449   2.023e-11   L2_None_chr1_-_91180355 L2_None_chr1_-_91180355 1   91180355
7.151   9.129e-11   L2_None_chr1_-_91180355 L2_None_chr1_-_91180355 1   91180355
7.149   9.201e-11   L2_None_chr1_-_91182695 L2_None_chr1_-_91182695 1   91182695
7.189   7.534e-11   L2_None_chr1_-_91183491 L2_None_chr1_-_91183491 1   91183491

I tried in R the function match but this was not helping me completely...

4

2 回答 2

2

这应该点它:

 merge(dat,dat1,by.x='tag',by.y='tag')
                      tag   stat stat.1   P.value   V2        V3
1  L2_None_chr1_-_109092036 0.3049  7.464 1.875e-11 chr1 109092034
2  L2_None_chr1_-_109092036 0.2961  7.448 2.105e-11 chr1 109092034
3  L2_None_chr1_-_109092036 0.2934  7.347 3.389e-11 chr1 109092034
4  L2_None_chr1_-_109092036 0.2961  7.245 5.668e-11 chr1 109092034
5  L2_None_chr1_-_109957962 0.6682  7.284 4.664e-11 chr1 109957879
6  L2_None_chr1_-_109957962 0.6682  7.284 4.664e-11 chr1 109957879
7  L2_None_chr1_-_159842839 0.3933  7.363 3.127e-11 chr1 159842779
8  L2_None_chr1_-_159842839 0.3808  7.284 4.672e-11 chr1 159842779
9  L2_None_chr1_-_169972458 0.2993  7.170 8.278e-11 chr1 169972444
10 L2_None_chr1_-_203626998 0.3312  7.817 3.075e-12 chr1 203626983
11 L2_None_chr1_-_203626998 0.3312  7.817 3.075e-12 chr1 203626983
12    L2_None_chr1_-_569826 0.6140  7.616 9.742e-12 chr1    569802
13    L2_None_chr1_-_569826 0.6411  7.580 1.037e-11 chr1    569802
14    L2_None_chr1_-_569826 0.5755  7.275 4.871e-11 chr1    569802
15   L2_None_chr1_-_6546011 0.6893  7.260 5.255e-11 chr1   6545930
16  L2_None_chr1_-_91180355 0.3136  7.529 1.350e-11 chr1  91180310
17  L2_None_chr1_-_91180355 0.3262  7.449 2.023e-11 chr1  91180310
18  L2_None_chr1_-_91180355 0.2980  7.151 9.129e-11 chr1  91180310
19  L2_None_chr1_-_91182695 0.2999  7.149 9.201e-11 chr1  91182572
20  L2_None_chr1_-_91183491 0.5383  7.189 7.534e-11 chr1  91183389
于 2013-08-06T16:34:43.097 回答
1

您可能正在寻找 linuxjoin命令。man join是一个开始,你的命令将是这样的

join -1 4 -2 1 <(sort FILE1) <(sort FILE2)

-1-2指定将用于匹配的相应文件中的字段。sort如果文件已经排序,则不需要。

于 2013-08-06T16:35:46.107 回答