bash - 基于字符串比较的过滤

Question

我有一个包含多列的文件。我正在尝试过滤掉前两个字段中具有相同值的记录。这两个字段都包含文本值。这是我正在使用的命令：

cat input_file | awk -F'\t' '{if($1==$2) print $1 $2}'

当我运行这个命令时，我只得到那些字段中的值为数字的行。该文件包含几行，这些行在两个非数字字段中具有相同的值。如何强制 awk 进行字符串比较？

另外，我还有其他方法可以做到这一点吗？（我是 Unix 环境的新手，不知道太多技巧......不胜感激）

score 2 · Accepted Answer

如果要过滤掉前两列相同的所有行，只需使用空格awk '$1!=$2' file作为awk默认字段分隔符，默认操作是打印。

$ cat file
1       1        col3   line1
two     two      col3   line2
three   3        col3   line3           
four4   four4    col3   line4

$ awk '$1!=$2' file
three   3        col3   line3           

$ awk '$1==$2' file
1       1        col3   line1
two     two      col3   line2
four4   four4    col3   line4

字段类型无关紧要，cat也不需要使用。

score 0 · Accepted Answer

0

纯重击

while read x y
do
  [ $x = $y ] && echo $x $y
done < input_file

于 2013-01-08T09:30:00.877 回答

score 0 · Accepted Answer

你实际上是在做正确的，除了你已经添加了-F'\t'这给你带来的问题。在 awk 中，字段分隔符 FS 的默认值是一个包含单个空格“”的字符串。

所以你需要删除它-F'\t'。

例如，见下文：

> cat temp
1       1 random text
some some random text
some more random text


> nawk '{if($1==$2){print}}' temp
1       1 random text
some some random text

> nawk -F'\t' '{if($1==$2){print}}' temp
>

我还不确定为什么第二个命令不起作用。但是是的，你需要删除它-F

score 0 · Accepted Answer

我以 sudo_O 的例子为例

[sgeorge@sgeorge-ld ~]$ cat s
1       1        col3   line1
two     two      col3   line2
three   3        col3   line3           
four4   four4    col3   line4
[sgeorge@sgeorge-ld ~]$ cat s | perl -lane '$F[0] == $F[1] && print'
1       1        col3   line1
two     two      col3   line2
four4   four4    col3   line4

bash - 基于字符串比较的过滤

4 回答 4

Related

Reference