unix - 用于查找字符串集交点或异常值的 Unix 命令？

Question

是否有与 UNIX 命令相当的

sort | uniq

查找字符串集的交点或“异常值”。

一个示例应用程序：我有一个 html 模板列表，其中一些有 {% load i18n %} 字符串，其他没有。我想知道哪些文件没有。

编辑： grep -L 解决了上述问题。

这个怎么样：

文件1：

mom
dad
bob

文件2：

dad

% 相交文件 1 文件 2

dad

%left-unique file1 file2

mom
bob

score 39 · Accepted Answer

似乎grep -L解决了海报的真正问题，但对于提出的实际问题，找到两组字符串的交集，您可能需要查看“comm”命令。例如，如果file1和file2each 包含一个排序的单词列表，每行一个单词，那么

$ comm -12 file1 file2

将产生两个文件共有的单词。更一般地说，给定排序的输入文件file1和file2，命令

$ comm file1 file2

产生三列输出

N您可以使用该-N选项抑制输出中的列。因此，上面的命令comm -12 file1 file2, 抑制了第 1 列和第 2 列，只留下两个文件共有的单词。

score 9 · Accepted Answer

9

相交：

# sort file1 file2 | uniq -d
dad

左唯一：

# sort file1 file2 | uniq -u
bob
mom

于 2009-06-19T04:27:17.017 回答

score 7 · Accepted Answer

两个（未排序的）文件之间的交集：

grep -Fx -f file1 file2

file2 中不在 file1 中的行：

grep -Fxv -f file1 file2

解释：

score 5 · Accepted Answer

也许我误解了这个问题，但为什么不只使用 grep 来查找字符串（使用 -L 选项让它打印其中没有字符串的文件的名称）。

换句话说

grep -L "{% load i18n %}" file1 file2 file3 ... etc

或根据需要为文件名使用通配符。

score 2 · Accepted Answer

从男人grep

-L, --files-without-match

抑制正常输出；而是打印通常不会打印输出的每个输入文件的名称。扫描将在第一次匹配时停止。

因此，如果您的模板是您想要的 .html 文件：

grep -L '{% load i18n %}' *.html

score 2 · Accepted Answer

路口：

comm -12 <(cat file1 | sort | uniq) <(cat file2 | sort | uniq)

3 列的所有行（文件 1 | 文件 2 | 交集）：

comm <(cat file1 | sort | uniq) <(cat file2 | sort | uniq)

如果您的文件未排序和/或如果其中一个文件中可能有重复但未出现在另一个文件中的行 - 此单行命令将对您的文件进行排序，删除重复的行，您将直接得到你想要的结果。

6 回答 6