0

我目前正在重新构建一个相当庞大的数据库,并且我想加入 3 个具有半匹配内容的表,我有几组这些表,但它们都以三组的形式出现。情况如下:

--注意所有表格都是ascii格式,空格分隔---

T1_01= 表 1 =

1 + 'stuff1' + additional content 1  (where additional content only sometimes available)

2  ""

3  ""

....400

T1_02= 表 2 =

1 + "different stuff" + additional content 2

2 ""

3 ""

... 400

T1_03 = 表 3 =

5 cols yet other stuff + 001 + additional content 3

5 cols yet other stuff + 003    ""

5 cols yet other stuff + 007    ""

...

5 cols yet other stuff + 399   some rows are skipped, varies which ones

5 cols yet other stuff + 400

我想要的是,对于每个“组”,我有 3 个表,因为这些表以方便的方式分组,即 T1_01、T1_02、T1_03 将是第 1 组的表 1、2、3,然后是 T2_01、T2_02、T2_03。我总共需要这样做大约 60 次,我希望的表格输出是:

T1_0123=

1 + 'stuff1' + additional content 1 1 + "different stuff" + additional content 2 5 cols yet other stuff + 001 + additional content 3
2 + 'stuff1' + additional content 1 2 + "different stuff" + additional content 2 "something to fill in the empty spaces, like a set of -99.9 values"
3 + 'stuff1' + additional content 1 3 + "different stuff" + additional content 2 5 cols yet other stuff + 003 + additional content 3
...
400 ""

现在我做了一个初步的运行

join -1 1 -2 1 T1_01 T1_02 > T1_012效果很好,但只有前两个和

join -1 1 -2 6 T1_01 T1_03...不起作用,因为 001 不是 1

我希望一次运行所有 3 个表,然后执行类似 sed something awk $(cat list_of_T01) $(cat list_of_T02) $(cat list_of_T03)批处理作业的操作。我一直在学习 python,所以这也可能在那里,但我肯定 AWK 更容易?欢迎任何建议。

4

1 回答 1

1

试试这个:

join -1 1 -2 6 <(sed 's/^[0-9] /00&/;s/^[0-9][0-9] /0&/;' T1_01) T1_03

或者如果您的分隔符不是空格,则为这个:

join -1 1 -2 6 <(sed 's/^[0-9][^0-9]/00&/;s/^[0-9][0-9][^0-9]/0&/;' T1_01) T1_03
于 2012-08-13T06:59:21.477 回答