1

我有 4 个目录(名称:英语、阿拉伯语、法语和俄语),其中包含一些名称相同但扩展名不同的文件。以下是此类文件的示例:

MyFile104.en
MyFile104.ar
MyFile104.fr
MyFile104.ru

在某些情况下,可以看到一个文件存在于一个目录中,但它已从其他目录中删除(例如,只有“MyFile23.ar”和“MyFile23.en”存在)。

我想要做的是比较这些目录中的文件名并删除所有目录中不存在的文件。

我怎样才能做到这一点?

4

3 回答 3

0

All files with the specified extensions are processed. Subdirectories of the specified directories are ignored. File .extensions other than the ones specified are ignored. Files, whose names end with any of the specified .extensions, and which don't occur in all of specified directories, are removed.

Note: the script assumes that there are no files with a wrong file extension in the directories being processed. For example, any .ar or .en or .fr in the Rusian directory may cause undesirable results!.

langs=( Arabic English French Russian )
extns=( ar en fr ru )
rxext="${extns[@]}" 
rxext="${rxext// /\\|}"  # regex: for .extensions

find "${langs[@]}" -maxdepth 1 -type f -regex ".*\.\($rxext\)$" -printf '%f\n' |
  sed -n "s/\(.*\)\..*$/\1/p" | sort | uniq -c |           # name is in how many dirs?
    sed -n "/^ *${#langs[@]} \(.*\)/!{s/^ *[0-9]\+ //p}" | # names not in all dirs
      xargs -I {} find "${langs[@]}" -maxdepth 1 -type f -regex ".*\/{}\.\($rxext\)$" |
        xargs -I {} rm {}

If you want to keep and ignore files with non-native language extensions in a given language's directory, then this next script will do just that.

all="/tmp/all_${0##*/}"
del="/tmp/del_${0##*/}"

extns=( ar     en      fr     ru      )
langs=( Arabic English French Russian )

# list names (not paths) whose extensions match a given directory
for (( i=0; i<${#extns[@]}; i++ )) do
  find "${langs[i]}" -maxdepth 1 -type f -name "*.${extns[i]}" -printf '%f\n'
done >"$all"

# list names not in all dirs
sed -nr "s/(.*)\..*$/\1/p" "$all" | sort | uniq -c |         # name is in how many dirs?
  sed -nr "/^ *${#extns[@]} (.*)/!{s/^ *[0-9]+ //p}" >"$del" 

# list names in deficit; add paths; then delete files
grep -F -f "$del" "$all" |
  sed -r "$(for (( i=0; i<${#extns[@]}; i++ )) ;do
              echo "s/.*\.${extns[i]}$/${langs[i]}\\/&/;t;"
            done)" | xargs -d "\n" rm

rm "$all" "$del"
于 2012-06-24T22:07:13.000 回答
0
# create test environment
$ for i in ar fr ru en; do mkdir -p $i; touch $i/t1.{fr,en,ru,ar}; done
$ rm en/t1.fr

# print filenames to be deleted (less than 4 occurances)
$ find ar en fr ru  -name t1.'*' | sed 's/^.*\///' | sort | uniq -c | awk '$1!=4{print $2}'

# remove files with less than 4 occurances
$ find ar en fr ru -name $(find ar en fr ru  -name t1.'*' | sed 's/^.*\///' | sort | uniq -c | awk '$1!=4{print $2}') -delete
于 2012-06-25T18:04:29.357 回答
0

如果你有支持关联数组的 Bash 4:

#!/bin/bash
declare -A languages
languages["English"]=en
languages["Arabic"]=ar
languages["French"]=fr
languages["Russian"]=ru

for language in "${!languages[@]}"
do
    for file in MyFile*."${languages[language]}"
    do
        rmflag=false
        for compare in "${!languages[@]}"
        do
            compfile=compare/${file##*/}
            compfile=${compfile%.${languages[language]}}.${languages[compare]}
            if ! $rmflag && [[ ! -e $compfile ]]
            then
                rm "$file"
                rmflag=true
            elif $rmflag && [[ -e $compfile ]]
            then
                rm "$compfile"
            fi
        done
    done
done

它未经测试,如果有不符合要求的文件MyFile*.{en,ar,fr,ru}或主目录下有目录,它可能无法执行您想要的操作。

这是另一个适用于 Bash 3 或 4 的版本:

#!/bin/bash
for dir in English Arabic French Russian
do
    for file in "$dir"/*
    do
        base=${file##*/}
        base=${base%.*}
        files=({English,Arabic,French,Russian}/"$base".{en,ar,fr,ru})
        if (( ${#files[@]} != 4 ))
        then
            rm -f {English,Arabic,French,Russian}/"$base".{en,ar,fr,ru}
        fi
    done
done

再次,未经测试。在与以前相同的条件下或文件放在错误的位置(例如,如果一个英国人正在访问法国),它可能无法满足您的需求。;-)

于 2012-06-24T19:17:19.793 回答