linux - 在两个不同的文件中显示重复的行

Question

我有两个文件，我想显示重复的行。我试过这个但它不起作用：

cat id1.txt | while read id; do grep "$id" id2.txt; done

我想知道是否还有其他方法可以显示文件中的重复行。我的两个文件都包含 id 列表。谢谢你。

score 20 · Accepted Answer

文件是否排序？它们可以排序吗？

如果排序：

comm -12 id1.txt id2.txt

如果未排序但使用bash4.x：

comm -12 <(sort id1.txt) <(sort id2.txt)

如果您没有bash4.x 和'process substitution' ，则有使用临时文件的解决方案。

你也可以使用grep -F：

grep -F -f id1.txt id2.txt

id1.txt这会查找出现在中的单词id2.txt。这里唯一的问题是确保一个 ID1不匹配包含1某处的每个 ID。某些版本中可用的-w或选项将在此处起作用。-xgrep

score 12 · Accepted Answer

如果通过检测重复项是指打印两个文件中都存在的行（或在一个文件中重复），则可以使用uniq：

$ cat file1 file2 | sort | uniq -d

score 2 · Accepted Answer

您可以改用该comm命令：

sort id1.txt > id1.txt.sorted
sort id2.txt > id2.txt.sorted
comm -12 id1.txt.sorted id2.txt.sorted

如果您想在一个命令中执行此操作：

comm -12 <(sort id1.txt) <(sort id2.txt)

参数comm：

该-1参数抑制第一个文件中唯一的行。
该-2参数抑制第二个文件中唯一的行。
如果你传递一个-3参数，它会抑制公共行。

score 1 · Accepted Answer

使用 awk 将节省您的时间。

awk 'FNR==NR{lines[$0]=1;next} $0 in lines' id1.txt id2.txt

#explaination
FNR==NR #check whether the File NR equal to NR, 
#which will only be true for the first file
lines[$0]=1 #put the contents into a dictionary, 
#value is 1, key is the lines of the first file
next #do not do the next commands if FNR==NR
$0 in lines #check whether the line in the second file
# is in the dictionary
#if yes, will print the $0
#acturally, I omitted the {print},
#which is default to print by awk if condition is true

linux - 在两个不同的文件中显示重复的行

4 回答 4

Related

Reference