linux - diff 的奇怪行为 | grep | sed

Question

我的目标是提取那些File1不存在于File2

比方说File1：

String A
String B
String C

比方说File2：

String B
String C

行已排序，我的输出应为：

String A

所以我为它写了这个命令：

diff File1 File2 | grep '^<' | sed 's/< //g'

这适用于我的 10 - 20 行的示例文件。

我也试过这个命令：

grep -Fxv -f File2 File1

这也适用于我的示例文件。

但问题是：

它似乎不适用于我的实际数据集，这是两个文件，每个文件大约 100 万行。

怎么了？我忽略了什么？什么是要做？

score 6 · Accepted Answer

6

这就是该工具comm的用途：

$ comm -23 file1 file2
String A

于 2013-02-04T13:50:29.080 回答

score 1 · Accepted Answer

试试这个：

awk 'NR==FNR{a[$0];next}!($0 in a)' file2 file1

如果这适用于您的实际文件？

score 0 · Accepted Answer

听起来您可能遇到了空格问题。

你有没有尝试过：

diff -uBb File1 File2

从手册页：

   -b  --ignore-space-change
          Ignore changes in the amount of white space.

   -B  --ignore-blank-lines
          Ignore changes whose lines are all blank.

score 0 · Accepted Answer

你的两个文件都排序了吗？如果数据有问题，差异将无法工作...
检查：

diff -uBb <(sort File1) <(sort File2)

注意：包含 Anew 的答案。

linux - diff 的奇怪行为 | grep | sed

4 回答 4

Related

Reference