让我们对您的示例数据进行修改:
文件 1
"Full nameA","URL-style name","key_1a","key_2a"
"Full nameB","URL-style name","key_1b","key_2b"
"Full nameC","URL-style name","key_1c","key_2c"
文件2
"URL-style name1","key_1a","key_2a"
"URL-style name2","key_1b","key_2b"
"URL-style name3","key_1c","key_2c"
加工
如评论中所述,该命令的一个限制join
是它只能连接单个列,但问题有一个包含两列的复合键。当然,有一些方法可以解决这个问题:基本上,您将输入重新格式化,join
以便复合列可识别为使用的分隔符下的单个列,并且您必须确保每个文件中的数据按顺序正确排序复合柱。不过,join
可能是这样做的方法;只需要一些准备工作和后处理。此外,Bash v4 具有“进程替换”,这对这个命令非常有用。
file1
使用我们需要的数据生成一个可连接的文件。
做这件事有很多种方法; 两者sed
(有点不可思议)或awk
可以使用:
$ sed 's/\([^,]*\),[^,]*,\([^,]*\),\([^,]*\)/\2:\3,\1/' file1
"key_1a":"key_2a","Full nameA"
"key_1b":"key_2b","Full nameB"
"key_1c":"key_2c","Full nameC"
$ awk -F, '{ printf "%s:%s,%s\n", $3, $4, $1 }' file1
"key_1a":"key_2a","Full nameA"
"key_1b":"key_2b","Full nameB"
"key_1c":"key_2c","Full nameC"
$
file2
使用我们需要的数据生成可连接文件:
$ sed 's/\([^,]*\),\([^,]*\),\([^,]*\)/\2:\3,\1/' file2
"key_1a":"key_2a","URL-style name1"
"key_1b":"key_2b","URL-style name2"
"key_1c":"key_2c","URL-style name3"
$ awk -F, '{ printf "%s:%s,%s\n", $2, $3, $1 }' file2
"key_1a":"key_2a","URL-style name1"
"key_1b":"key_2b","URL-style name2"
"key_1c":"key_2c","URL-style name3"
$
鉴于这种预处理,直接sort
就足以让数据准备好join
。
$ join -t, -o 2.2,0,1.2 \
> <(awk -F, '{ printf "%s:%s,%s\n", $3, $4, $1 }' file1 | sort) \
> <(awk -F, '{ printf "%s:%s,%s\n", $2, $3, $1 }' file2 | sort)
"URL-style name1","key_1a":"key_2a","Full nameA"
"URL-style name2","key_1b":"key_2b","Full nameB"
"URL-style name3","key_1c":"key_2c","Full nameC"
$
现在我们需要将冒号后处理成逗号:
$ join -t, -o 2.2,0,1.2 \
> <(awk -F, '{ printf "%s:%s,%s\n", $3, $4, $1 }' file1 | sort) \
> <(awk -F, '{ printf "%s:%s,%s\n", $2, $3, $1 }' file2 | sort) |
> sed 's/":"/","/'
"URL-style name1","key_1a","key_2a","Full nameA"
"URL-style name2","key_1b","key_2b","Full nameB"
"URL-style name3","key_1c","key_2c","Full nameC"
$
显然,您可以选择任何合适的字符而不是冒号;Control-A(0x01) 不太可能出现在您的 HTML 中。
这假定,如图所示,您的 CSV 数据在字符串中没有逗号。如果字符串中有逗号,那么生活会更加艰难;您需要一个合适的 CSV 解释器来处理数据。Perl 有Text::CSV
并且还有csvfix
.