file - awk：从许多不同的文件中提取不同的列

Question

文件示例

I have a 3-10 amount of files with:

 - different number of columns
 - same number of rows
 - inconsistent spacing (sometimes one space, other tabs, sometimes many spaces) **within** the very files like the below


>      0    55.4      9.556E+09   33
>      1     1.3      5.345E+03    1
>        ........
>     33   134.4      5.345E+04  932
>
       ........

我需要从file1中获取第1列，从file2中获取第3列，从file3中获取第7列和从file4中获取第1列，并将它们并排合并到一个文件中。

试验一：不工作

paste <(cut -d[see below] -f1 file1) <(cut -d[see below] -f3 file2) [...]
其中分隔符为“”或为空。

试验 2：使用 2 个文件但不使用许多文件

awk '{
     a1=$1;b1=$4;
     getline <"D2/file1.txt";
     print a1,$1,b1,$4
}' D1/file1.txt >D3/file1.txt

现在更一般的问题：

如何从许多不同的文件中提取不同的列？

score 20 · Accepted Answer

In your paste / cut attempt, replace cut by awk:

$ paste <(awk '{print $1}' file1 ) <(awk '{print $3}' file2 ) <(awk '{print $7}' file3) <(awk '{print $1}' file4)

score 8 · Accepted Answer

假设您的每个文件具有相同的行数，这是使用GNU awk. 像这样运行：

awk -f script.awk file1.txt file2.txt file3.txt file4.txt

内容script.awk：

FILENAME == ARGV[1] { one[FNR]=$1 }
FILENAME == ARGV[2] { two[FNR]=$3 }
FILENAME == ARGV[3] { three[FNR]=$7 }
FILENAME == ARGV[4] { four[FNR]=$1 }

END {
    for (i=1; i<=length(one); i++) {
        print one[i], two[i], three[i], four[i]
    }
}

笔记：

By default, awk separates columns on whitespace. This includes tab characters and spaces, and any amount of these. This makes awk ideal for files with inconsistent spacing. You can also expand the above code to include more files if you wish.

score 1 · Accepted Answer

cut和的组合paste应该起作用：

$ cat f1
foo
bar
baz
$ cat f2
1 2 3
4 5 6
7 8 9
$ cat f3
a b c d
e f g h
i j k l
$ paste -d' ' <(cut -f1 f1) <(cut -d' ' -f2 f2) <(cut -d' ' -f3 f3)
foo 2 c
bar 5 g
baz 8 k

编辑：这也适用于标签：

$ cat f4
a       b       c       d
e       f       g       h
i       j       k       l
$ paste -d' ' <(cut -f1 f1) <(cut -d' ' -f2 f2) <(cut -f3 f4)   
foo 2 c
bar 5 g
baz 8 k

file - awk：从许多不同的文件中提取不同的列

3 回答 3

Related

Reference