join - awk-sed 将 3 个表与一个几乎相同的列连接起来......肯定比 python 更容易

Question

我目前正在重新构建一个相当庞大的数据库，并且我想加入 3 个具有半匹配内容的表，我有几组这些表，但它们都以三组的形式出现。情况如下：

--注意所有表格都是ascii格式，空格分隔---

T1_01= 表 1 =

1 + 'stuff1' + additional content 1  (where additional content only sometimes available)

2  ""

3  ""

....400

T1_02= 表 2 =

1 + "different stuff" + additional content 2

2 ""

3 ""

... 400

T1_03 = 表 3 =

5 cols yet other stuff + 001 + additional content 3

5 cols yet other stuff + 003    ""

5 cols yet other stuff + 007    ""

...

5 cols yet other stuff + 399   some rows are skipped, varies which ones

5 cols yet other stuff + 400

我想要的是，对于每个“组”，我有 3 个表，因为这些表以方便的方式分组，即 T1_01、T1_02、T1_03 将是第 1 组的表 1、2、3，然后是 T2_01、T2_02、T2_03。我总共需要这样做大约 60 次，我希望的表格输出是：

T1_0123=

1 + 'stuff1' + additional content 1 1 + "different stuff" + additional content 2 5 cols yet other stuff + 001 + additional content 3
2 + 'stuff1' + additional content 1 2 + "different stuff" + additional content 2 "something to fill in the empty spaces, like a set of -99.9 values"
3 + 'stuff1' + additional content 1 3 + "different stuff" + additional content 2 5 cols yet other stuff + 003 + additional content 3
...
400 ""

现在我做了一个初步的运行

join -1 1 -2 1 T1_01 T1_02 > T1_012效果很好，但只有前两个和

join -1 1 -2 6 T1_01 T1_03...不起作用，因为 001 不是 1

我希望一次运行所有 3 个表，然后执行类似 sed something awk $(cat list_of_T01) $(cat list_of_T02) $(cat list_of_T03)批处理作业的操作。我一直在学习 python，所以这也可能在那里，但我肯定 AWK 更容易？欢迎任何建议。

score 1 · Accepted Answer

试试这个：

join -1 1 -2 6 <(sed 's/^[0-9] /00&/;s/^[0-9][0-9] /0&/;' T1_01) T1_03

或者如果您的分隔符不是空格，则为这个：

join -1 1 -2 6 <(sed 's/^[0-9][^0-9]/00&/;s/^[0-9][0-9][^0-9]/0&/;' T1_01) T1_03

join - awk-sed 将 3 个表与一个几乎相同的列连接起来......肯定比 python 更容易

1 回答 1

Related

Reference