0

如果我有 3 个 csv 文件,并且我想将数据全部合并为一个,但彼此并排,我该怎么做?例如:

初始合并文件:

,,,,,,,,,,,,

文件 1:

20,09/05,5694
20,09/06,3234
20,09/08,2342

文件 2:

20,09/05,2341
20,09/06,2334
20,09/09,342

文件 3:

20,09/05,1231
20,09/08,3452
20,09/10,2345
20,09/11,372

最终合并文件:

09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,,,09/11,372

基本上来自每个文件的数据进入合并文件的特定列。我知道 awk 函数可以用于此,但我不知道如何开始

编辑:仅打印每个文件的第 2 列和第 3 列。我用它来打印第二列和第三列:

awk -v f="${i}" -F, 'match ($0,f) { print $2","$3 }' file3.csv > d$i.csv

但是,例如,file1 和 file2 在该行中为空,该行的数据将向左移动。所以我想出了这个来解释这种转变:

awk -v x="${i}" -F, 'match ($0,x) { if ($2='/NULL') { print "," }; else { print $2","$3}; }' alld.csv > d$i.csv
4

3 回答 3

3

paste为此完成:

$ paste -d";" f1 f2 f3 | sed 's/;/,,,/g'
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,09/11,372

请注意,pastealone 将仅输出一个逗号:

$ paste -d, f1 f2 f3
09/05,5694,09/05,2341,09/05,1231
09/06,3234,09/06,2334,09/08,3452
09/08,2342,09/09,342,09/10,2345
,,09/11,372

因此,要拥有多个分隔符,我们可以使用另一个分隔符;,然后,,,用 sed 替换:

$ paste -d";" f1 f2 f3 | sed 's/;/,,,/g'
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,09/11,372
于 2013-10-24T13:57:17.670 回答
3

将 GNU awk 用于 ARGIND:

$ gawk '{ a[FNR,ARGIND]=$0; maxFnr=(FNR>maxFnr?FNR:maxFnr) }
    END {
        for (i=1;i<=maxFnr;i++) {
            for (j=1;j<ARGC;j++)
                printf "%s%s", (j==1?"":",,,"), (a[i,j]?a[i,j]:",")
            print ""
        }
    }
' file1 file2 file3
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,,,09/11,372

如果您没有 GNU awk,只需添加一个初始行,即FNR==1{ARGIND++}.

每个请求的评论版本:

$ gawk '
    { a[FNR,ARGIND]=$0; # Store the current line in a 2-D array `a` indexed by
                        # the current line number `FNR` and file number `ARGIND`.

      maxFnr=(FNR>maxFnr?FNR:maxFnr)    # save the max FNR value
    }
    END{
        for (i=1;i<=maxFnr;i++) {  # Loop from 1 to max number of fields
                                   # seen across all files and for each:
            for (j=1;j<ARGC;j++)     # Loop from 1 to total number of files parsed and:
                printf "%s%s",         # Print 2 strings, specifically:
                   (j==1?"":",,,"),      # A field separator - empty if were printing
                                         # the first field, three commas otherwise.
                   (a[i,j]?a[i,j]:",")   # The value stored in the array if it was
                                         # present in the files, a comma otherwise.
            print ""                   # Print a newline
        }
    }
' file1 file2 file3

我最初是使用一个数组fnr[FNR]来跟踪 FNR 的最大值,但恕我直言,这有点晦涩难懂,并且它有一个缺陷,如果没有行有第二个字段,那么for (i=1;i in fnr;i++)END部分中的循环将在进入第三个字段之前退出.

于 2013-10-24T16:33:06.130 回答
2

使用pr

$ pr -mts',,,' file[1-3]
09/05,5694,,,09/05,2341,,,09/05,1231
09/06,3234,,,09/06,2334,,,09/08,3452
09/08,2342,,,09/09,342,,,09/10,2345
,,,,,,09/11,372
于 2013-10-24T14:56:21.057 回答