awk - 用 awk 比较文件

Question

嗨，我有两个相似的文件（都有 3 列）。我想检查这两个文件是否包含相同的元素（但以不同的顺序列出）。首先，我只想比较第一列

文件1.txt

"aba" 0 0 
"abc" 0 1
"abd" 1 1 
"xxx" 0 0

文件2.txt

"xyz" 0 0
"aba" 0 0
"xxx" 0 0
"abc" 1 1

我怎样才能使用 awk 做到这一点？我试图环顾四周，但我发现的只是复杂的例子。如果我还想在比较中包括其他两列怎么办？输出应该给我匹配元素的数量。

score 29 · Accepted Answer

要打印两个文件中的公共元素：

$ awk 'NR==FNR{a[$1];next}$1 in a{print $1}' file1 file2
"aba"
"abc"
"xxx"

解释：

NR和FNR是awk变量，分别存储当前文件的总记录数和记录数（默认记录为一行）。

NR==FNR # Only true when in the first file 
{
    a[$1] # Build associative array on the first column of the file
    next  # Skip all proceeding blocks and process next line
}
($1 in a) # Check in the value in column one of the second files is in the array
{
    # If so print it
    print $1
}

如果要匹配整行，请使用$0：

$ awk 'NR==FNR{a[$0];next}$0 in a{print $0}' file1 file2
"aba" 0 0
"xxx" 0 0

或一组特定的列：

$ awk 'NR==FNR{a[$1,$2,$3];next}($1,$2,$3) in a{print $1,$2,$3}' file1 file2
"aba" 0 0
"xxx" 0 0

score 6 · Accepted Answer

要打印匹配元素的数量，这是一种使用方法awk：

awk 'FNR==NR { a[$1]; next } $1 in a { c++ } END { print c }' file1.txt file2.txt

使用您输入的结果：

如果您想添加额外的列（例如，第一列、第二列和第三列），请使用伪多维数组：

awk 'FNR==NR { a[$1,$2,$3]; next } ($1,$2,$3) in a { c++ } END { print c }' file1.txt file2.txt

使用您输入的结果：

awk - 用 awk 比较文件

2 回答 2

Related

Reference