我无法使用连接产生预期的结果。
我在 Windows 7 64 位上运行 GNUwin32。我正在运行 join 版本 5.3.0.1936 和 gawk 版本 3.1.6.2962。
输入以下两个表:
表格1
UID_C CID
C000002 31799
C000002 31800
C000386 14950
C000386 9807916
C000386 10255083
C008114 5318432
C008117 799
C008117 444150
C008117 46878464
表_2
UID_C CID name
C000002 31799 bevonium
C000002 31800 bevonium
C002284 24832095 hypromellose
C008117 799 indoleglycerol phosphate
C008117 444150 indoleglycerol phosphate
C008117 46878464 indoleglycerol phosphate
我在 bat 文件中使用以下命令:
C:\gnuwin32\bin\join -t"|" -1 1 -2 1 -a1 -a2 -e "NULL" -o "0,1.2,2.2,2.3" C:\directory\Table_1.txt C:\directory\Table_2.txt > C:\directory\Table_3.txt
在我关于 stackoverflow 的插图中,表格使用制表符进行格式化以便于阅读,但实际上我使用管道作为输入和输出分隔符。
输出下表:
表3
UID_C CID CID name
C000002 31800 31799 bevonium
C000002 31800 31800 bevonium
C000002 31799 31799 bevonium
C000002 31799 31800 bevonium
C000386 10255083 NULL NULL
C000386 9807916 NULL NULL
C000386 14950 NULL NULL
C002284 NULL 24832095 hypromellose
C008114 5318432 NULL NULL
C008117 46878464 799 indoleglycerol phosphate
C008117 46878464 444150 indoleglycerol phosphate
C008117 46878464 46878464 indoleglycerol phosphate
C008117 444150 799 indoleglycerol phosphate
C008117 444150 444150 indoleglycerol phosphate
C008117 444150 46878464 indoleglycerol phosphate
C008117 799 799 indoleglycerol phosphate
C008117 799 444150 indoleglycerol phosphate
C008117 799 46878464 indoleglycerol phosphate
所需的输出是:
表_4
UID_C CID name
C000002 31799 bevonium
C000002 31800 bevonium
C000386 14950 NULL
C000386 9807916 NULL
C000386 10255083 NULL
C002284 24832095 hypromellose
C008114 5318432 NULL
C008117 799 indoleglycerol phosphate
C008117 444150 indoleglycerol phosphate
C008117 46878464 indoleglycerol phosphate
如何更改连接命令以产生所需的输出?
或者,我应该如何使用 awk 作为 Table_3 的后处理来生成 Table_4?
提前感谢您的建议。