bash - 使用键 -> 值结构合并两个文件

Question

我正在寻找想法，而不是 shell(linux) 中下一个问题的完整解决方案。什么是最好的解决方案？（awk，while循环，sed ....）

我有两个具有相同行结构的文件：key-value-value. 我想合并这两个文件。如果该值不存在，则脚本插入一个新行。如果存在，则脚本会更新值（通过将它们相加）。

Example:
File 1:

john-15-40
doo-10-91
mary-14-19
foo-11-0

File 2:

foo-110-10
john-22-11
ghost-1000-1000

Result:
foo-121-10
john-37-51
ghost-1000-1000
doo-10-91
mary-14-19

我怎样才能做到这一点？

score 4 · Accepted Answer

Simple with awk

awk '
  BEGIN {FS = OFS = "-"}
  {v1[$1] += $2; v2[$1] += $3}
  END {for (key in v1) {print key, v1[key], v2[key]}}
' F1 F2

score 1 · Accepted Answer

这可以在 Bash 4 中本地完成：

#!/bin/bash
declare -A vals_one vals_two
while IFS=- read key val1 val2; do
  if [[ ${vals_one["$key"]} ]] ; then
    vals_one["$key"]=$(( ${vals_one["$key"]} + val1 ))
    vals_two["$key"]=$(( ${vals_two["$key"]} + val2 ))
  else
    vals_one["$key"]=$val1
    vals_two["$key"]=$val2
  fi
done < <(cat input1.txt input2.txt)
for key in "${!vals_one[@]}"; do
  printf '%s-%s-%s\n' "$key" "${vals_one[$key]}" "${vals_two[$key]}"
done

请注意，这种方法有点内存效率低下；一种更节省内存的方法会在合并文件之前对文件进行排序（如果它排序的内容无法放入内存中，GNU sort 能够生成临时文件，因此比我们编写的任何合理脚本都更有能力），因此只需要在内存中一次存储两行：

#!/bin/bash

function merge_inputs {
    IFS=- read key val1 val2
    while IFS=- read new_key new_val1 new_val2; do
      if [[ $key = "$new_key" ]] ; then
        val1=$(( val1 + new_val1 ))
        val2=$(( val2 + new_val2 ))
      else
        printf '%s-%s-%s\n' "$key" "$val1" "$val2"
        key=$new_key
        val1=$new_val1
        val2=$new_val2
      fi
    done
    printf '%s-%s-%s\n' "$key" "$val1" "$val2"
}
sort input1.txt input2.txt | merge_inputs

此外，后一种形式不需要关联数组，并且可以与旧版本的 bash 一起使用（或者，经过一些调整，可以使用其他 shell）。

score 1 · Accepted Answer

您需要一种带有辅助数组的语言。对于任何脚本语言，您的任务都非常简单，但 perl 和 awk 特别适合逐行处理文本文件。

伪代码：

read line from file1, file2
split line to key and values
if there are no key in hash
     add key and values
else
     add values and print key/values

score 1 · Accepted Answer

我知道您没有在 PHP 中要求它，但它可能会有所帮助。如果您愿意，可能在另一种语言中有类似的东西：

<?PHP

$file_handle = fopen("file1", "r");

while (!feof($file_handle) ) {
$line_of_text = fgets($file_handle);
list($name,$value1,$value2) = explode('-', $line_of_text);
$file1[$name]=array($value1,$value2);
}
fclose($file_handle);
// repeate for file2
//then use the 2 arrays, $file1[] and $file2[] to rewrite the file as 'file3' or whatever. 
//Checking for duplicates and doing the math.
?>

score 1 · Accepted Answer

我喜欢格伦的short fat解决方案。并且有一个tall thin解决方案。

如果您有两个文件：1.txt和2.txt.

sort {1,2}.txt |
awk -F- -vOFS=- '
NR==1{
    x=$1
}
x==$1{
    y+=$2
    z+=$3
    next
}
{
    print x,y,z;
    x=$1
    y=$2
    z=$3
}
END{
    print
}'

bash - 使用键 -> 值结构合并两个文件

5 回答 5

Related

Reference