0

我比较了两个文件 file1 和 file2 使用diff -y. 当我wc -l在 diff 文件上执行时,我得到了 66000 的值。以前我已经将相同的 file1 和 file2 连接到其他东西上并执行wc -l了它。我得到了大约 84000 行作为计数。

当我wc -l对差异文件和连接文件执行 a 时,行数不应该相同吗?我在 diff 文件中遗漏了什么吗?

我还计算了两个文件之间有多少内容不同,有多少是相同的,有多少只存在于一个与另一个中,然后将它们总结起来,总共有 66000 个。

当我连接时,我做到了

sort | uniq | wc -l

当我区分它时,我只是做了一个

wc -l

但是,这些文件已经是sorted 和uniqed 差异的文件。无法弄清楚我错过了什么!

4

1 回答 1

0

由于显示“更改”和“移动”行的方式,显示的行数可能与您描述的通过合并原始文件并通过 sort 和 uniq(ue) 过滤显示的diff -y行数没有关系diff -y.

这里我们有 2 个已排序的文件,分别为 20 行和 19 行。2 个文件的 sort+uniq 加起来显示 32 行,而 diff 只显示 20 行:

             | file1+2 |
             | sort    |  diff -y -W 12
file1| file2 | uniq    |  file1 file2
-----|-------|---------|---------------
act     act     act       act     act
all     all     all       all     all
and     and     and       and     and
bar     can     bar       bar  |  can
boy     car     boy       boy  |  car
but     cat     but       but  |  cat
dad     dad     can       dad     dad
day     eat     car       day  |  eat
did     eel     cat       did  |  eel
dip     egg     dad       dip  |  egg
far     get     day       far  |  get
fir     gum     did       fir  |  gum
for     gym     dip       for  |  gym
hat     ill     eat       hat  |  ill
him     ink     eel       him  |  ink
hip     its     egg       hip  |  its
how     zap     far       how  <
zap     zip     fir       zap     zap
zip     zoo     for       zip     zip
zoo             get       zoo     zoo
                gum
                gym
                hat
                him
                hip
                how
                ill
                ink
                its
                zap
                zip
                zoo
-----|-------|---------|---------------
20   |  19   |  32     |  20


这里我们有 2 个排序的文件,每个文件有 19 行。2个文件的sort+uniq加起来显示31行,diff也显示31行:

             | file1+2 |
             | sort    |  diff -y -W 12
file1| file2 | uniq    |  file1 file2
-----|-------|---------|---------------
act     act     act       act     act
all     all     all       all     all
and     and     and       and     and
fad     bar     bar            >  bar
far     boy     boy            >  boy
fir     but     but            >  but
for     can     can            >  can
get     car     car            >  car
gum     cat     cat            >  cat
gym     dad     dad            >  dad
hat     day     day            >  day
him     did     did            >  did
hip     eat     eat            >  eat
ill     eel     eel            >  eel
ink     egg     egg            >  egg
its     fad     fad       fad     fad
zap     zap     far       far  <
zip     zip     fir       fir  <
zoo     zoo     for       for  <
                get       get  <
                gum       gum  <
                gym       gym  <
                hat       hat  <
                him       him  <
                hip       hip  <
                ill       ill  <
                ink       ink  <
                its       its  <
                zap       zap     zap
                zip       zip     zip
                zoo       zoo     zoo
-----|-------|---------|---------------
19   |  19   |  31     |  31


这里我们有 2 个文件,每个文件有 31 行。2个文件的sort+uniq加起来也显示31行,但是diff显示43行:

             | file1+2 |
             | sort    |  diff -y -W 12
file1| file2 | uniq    |  file1 file2
-----|-------|---------|---------------
act     act     act       act     act
all     all     all       all     all
and     and     and       and     and
bar     far     bar       bar  <
boy     fir     boy       boy  <
but     for     but       but  <
can     get     can       can  <
car     gum     car       car  <
cat     gym     cat       cat  <
dad     hat     dad       dad  <
day     him     day       day  <
did     hip     did       did  <
eat     how     eat       eat  <
eel     ill     eel       eel  <
egg     ink     egg       egg  <
far     its     far       far     far
fir     bar     fir       fir     fir
for     boy     for       for     for
get     but     get       get     get
gum     can     gum       gum     gum
gym     car     gym       gym     gym
hat     cat     hat       hat     hat
him     dad     him       him     him
hip     day     hip       hip     hip
how     did     how       how     how
ill     eat     ill       ill     ill
ink     eel     ink       ink     ink
its     egg     its       its     its
zap     zap     zap            >  bar
zip     zip     zip            >  boy
zoo     zoo     zoo            >  but
                               >  can
                               >  car
                               >  cat
                               >  dad
                               >  day
                               >  did
                               >  eat
                               >  eel
                               >  egg
                          zap     zap
                          zip     zip
                          zoo     zoo
-----|-------|---------|---------------
31   |  31   |  31     |  43

虽然最后一个示例不适合您的具体情况,因为其中一个输入文件未排序,但我已将其包含在可能没有您确切情况的其他人中。

于 2013-12-22T01:02:50.373 回答