2

我的文本文件按字母顺序排序。我想确定每一行是否包含在下一行中,如果是,则删除两者中的第一行。所以,例如,如果我有...

car 
car and trailer
train

......我想结束......

car and trailer
train

我找到了“sed one-liners”页面,其中包含搜索重复行的代码:

sed '$!N; /^(.*)\n\1$/!P; D'

...我认为删除 ^ 可以解决问题,但事实并非如此。

(使用不连续的行来执行此操作也很好,但我的文件运行到数千行,并且可能需要脚本数小时或数天才能运行。)

4

3 回答 3

2

sed 是在单行上进行简单替换的出色工具,其他任何事情只需使用 awk:

awk '$0 !~ prev{print prev} {prev=$0} END{print}' file
于 2012-12-09T19:56:26.883 回答
2

The original command

sed '$!N; /^\(.*\)\n\1$/!P; D'

Looks for an exact line match. As you want to check if the first line is contained in the second, you need to add some wild cards:

sed '$!N; /^\(.*\)\n.*\1.*$/!P; D'

Should do it.

于 2012-12-09T07:30:25.883 回答
0

你说:

用不连续的行来做这件事也很好。

这是一个bash脚本,用于删除另一行中包含的所有较短的行,不一定是连续的,不区分大小写:

#!/bin/bash
# sed with I and Q are gnu extensions:
cat test.txt | while read line; do
   echo Searching for: $line
   sed -n "/.$line/IQ99;/$line./IQ99" test.txt # or grep -i
   if [ $? -eq 99 ]; then
      echo Removing: $line
      sed -i "/^$line$/d" test.txt
   fi   
done

测试:

$ cat test.txt
Boat
Car
Train and boat
car and cat

$ my_script
Searching for: Boat
Removing: Boat
Searching for: Car
Removing: Car
Searching for: Train and boat
Searching for: car and cat

$ cat test.txt
Train and boat
car and cat
于 2012-12-09T08:35:45.350 回答