0

I have big filesystem structure which contains many files with different extensions, for example

index.html
logo.png
package.zip

Many of those files has its duplicates with added e letter at the end, so:

index.htmle
index.htmlee
package.zipe

and so on.

I have to remove recursively every duplicated file with e+$ at the end. But if there is file

file.exe

it should remove

file.exee
file.exeee

but not

file.exe

What is the best way to remove those files ?

4

2 回答 2

2

首先,我假设您有目录中所有文件的列表。它可以是纯文本文件。让我们称之为list

那么这个单线将完成这项工作:

sort list|awk '{if(p&&$0~p"e+$")print "rm "$0;p=$0}'

上面的这一行将打印rm foo命令,foo是重复的文件。举个例子:

kent$  cat list
index.html
logo.png
logo.pnge
package.zip
index.htmle
index.htmlee
package.zipe
file.exe
file.exee
file.exeee

kent$  sort list|awk '{if(p&&$0~p"e+$")print "rm "$0;p=$0}' 
rm file.exee
rm file.exeee
rm index.htmle
rm index.htmlee
rm logo.pnge
rm package.zipe

你可以看到,所有重复的文件都在生成的rm command. 如果你想执行,只需将这条线传递给|sh

sort list|awk '{if(p&&$0~p"e+$")print "rm "$0;p=$0}'|sh

编辑

最好用引号将文件名括起来,因为文件名可能包含空格和其他特殊字符:

sort list|awk '{if(p&&$0~p"e+$")print "rm \""$0"\"";p=$0}'
于 2013-04-24T14:29:52.550 回答
0

您需要找到所有以“e”结尾的文件

find ./ -name "*e"

然后需要反向排序以先命名较长的名称

| sort -r

然后去掉“e”

    chk4File=`echo $FILE | sed -e 's/e$//g'`

然后比较以确保它在删除最后带有'e'的文件之前存在

    if [ "${FILE}" != "${chk4File}" ] && [ -e ${chk4File} ]; then

将它们放在一起...该示例不会删除任何文件...只是告诉您哪些文件将被删除...我建议在不删除任何文件的情况下进行测试运行,并在真正删除之前查看结果。只需从 rm 命令中删除 # in 即可使其生效。

#!/bin/sh
for FILE in `find ./ -name "*e" | sort -r`
do
    chk4File=`echo $FILE | sed -e 's/e$//g'`
    if [ "${FILE}" != "${chk4File}" ] && [ -e ${chk4File} ]; then
        echo "Removing ${FILE} since ${chk4File} was found.."
        #rm -rf ${FILE}
    else
        echo "NOT Removing ${FILE} since ${chk4File} was NOT found.."
    fi
于 2013-04-24T14:30:09.093 回答