linux - 如何根据列将多个txt文件加入？

Question

我有 txt 文件，所有这些文件都在同一个目录中。每个有 2 列数据。它们看起来像这样：

标签1 数据A1 标签2 数据A2 标签
3 数据A3

我想用 join 来制作一个像这样的大文件。

标签1 数据A1 数据B1 数据C1 标签2 数据A2 数据B2 数据C2 标签
3 数据A3 数据B3
数据C3

目前，我有

加入文件A 文件B | 加入-fileC

但是，我有太多文件，无法列出所有文件 - 有没有办法为这种命令编写循环？

score 4 · Accepted Answer

使用 bash，您可以创建一个脚本来执行递归管道执行以进行连接：

#!/bin/bash

if [[ $# -ge 2 ]]; then
    function __r {
        if [[ $# -gt 1 ]]; then
            exec join - "$1" | __r "${@:2}"
        else
            exec join - "$1"
        fi
    }

    __r "${@:2}" < "$1"
fi

并将文件作为参数传递给脚本，例如：

bash script.sh file*

或排序形式，如：

find -type f -maxdepth 1 -name 'file*' -print0 | sort -z | xargs -0 bash script.sh

score 2 · Accepted Answer

使用 awk 你可以这样做：

awk 'NF > 0 { a[$1] = a[$1] " " $2 } END { for (i in a) { print i a[i]; } }' file*

如果要对文件进行排序：

find -type f -maxdepth 1 -name 'file*' -print0 | sort -z | xargs -0 awk 'NF > 0 { a[$1] = a[$1] " " $2 } END { for (i in a) { print i a[i]; } }'

有时 for (i in a) 填充键的顺序不是它们添加的顺序，因此您也可以对其进行排序，但这仅在 gawk 中可用。只有在第 1 列没有差异的情况下，才能为订单在索引数组中映射键的想法。

gawk 'NF > 0 { a[$1] = a[$1] " " $2 } END { count = asorti(a, b); for (i = 1; i <= count; ++i) { j = b[i]; print j a[j]; } }' ...

score 0 · Accepted Answer

此脚本将多个文件连接在一起（文件为file*）。

#!/bin/bash
# Create two temp files
tmp=$(mktemp)
tmp2=$(mktemp)
# for all the files
for file in file*
do
    # if the tmp file is not empty
    if [ -s "$tmp" ]
    then
        # then join the tmp file with the current file
        join "$tmp" "$file" > "$tmp2"
    else
        # the first time $tmp is empty, so we just copy the file
        cp "$file" "$tmp2"
    fi
    cp "$tmp2" "$tmp"
done

cat "$tmp"

我承认它很丑，但它似乎有效。

score -1 · Accepted Answer

-1

只需将所有文件放在一个文件夹中并执行

join * | join - /someotherdir/fileC

于 2013-08-09T17:25:27.807 回答

linux - 如何根据列将多个txt文件加入？

4 回答 4

Related

Reference