25

Here's an awk script that attempts to set difference of two files based on their first column:

BEGIN{
    OFS=FS="\t"
    file = ARGV[1]
    while (getline < file)
        Contained[$1] = $1
    delete ARGV[1]
    }
$1 not in Contained{
    print $0
}

Here is TestFileA:

cat
dog
frog

Here is TestFileB:

ee
cat
dog
frog

However, when I run the following command:

gawk -f Diff.awk TestFileA TestFileB

I get the output just as if the script had contained "in":

cat
dog
frog

While I am uncertain about whether "not in" is correct syntax for my intent, I'm very curious about why it behaves exactly the same way as when I wrote "in".

4

5 回答 5

34

我找不到任何关于.element not in array

试试!(element in array)


我猜:awk视为not未初始化的变量,因此not被评估为空字符串。

$1 not == $1 "" == $1
于 2012-06-06T23:45:48.020 回答
22

我想出了这个。( x in array ) 返回一个值,所以要“不在数组中”,你必须这样做:

if ( x in array == 0 )
   print "x is not in the array"

或在您的示例中:

($1 in Contained == 0){
   print $0
}
于 2012-10-09T22:22:23.263 回答
2

在我解决这个问题的方法中,我使用以下if-else语句:

if($1 in contained);else{print "Here goes your code for \"not in\""}
于 2017-06-27T15:04:26.760 回答
1

不确定这是否像您尝试做的那样。

#!/bin/awk
# 将读入第二个 arg 文件并对令牌进行哈希处理
# 在第一列中找到。然后它将读取第一个 arg 文件并打印任何
# 第一列中带有标记的行与已定义的标记不匹配
开始{
    OFS=FS="\t"
    文件 = ARGV[1]
    而(getline <文件)
        包含[$1] = $1
# delete ARGV[1] # 不知道你在想什么
# for(i in Contained) {print Contained[i]} # 调试,不仅仅是虐待狂
    关闭(ARGV[1])
}
{
   if ($1 in Contained){} else { print $1 }
}

于 2012-06-07T02:00:30.920 回答
0

在 awk 命令行中,我使用:

 ! ($1 in a)
$1 pattern
a array

例子:

awk 'NR==FNR{a[$1];next}! ($1 in a) {print $1}' file1 file2
于 2020-02-05T12:23:34.250 回答