linux - 如何查找所有没有相同名称但扩展名不同的匹配文件的文件

Question

我有一个包含超过 100 万个文件的文件夹。这些文件成对出现，仅在扩展名上有所不同（例如 a1.ext1 a1.ext2, a2.ext1, a2.ext2 ...）

我需要扫描这个文件夹并确保它满足这个要求（文件耦合），如果我找到一个不匹配的文件，我应该删除它。

我已经在 python 中完成了它，但是在处理 7 位数的文件时它非常慢..

有没有办法使用 shell 命令/脚本来做到这一点？

score 1 · Accepted Answer

在另一个答案的基础上，您可以使用这样的脚本（它应该位于文件所在的同一目录中，并且应该在那里执行）：

#!/usr/bin/env bash 
THRASH=../THRASH
mkdir "$THRASH" 2> /dev/null

for name in $(ls *.{ext1,ext2} | cut -d. -f1 | sort -u); do
    if [ $(ls "$name".{ext1,ext2} 2> /dev/null | wc -w) -lt 2 ]; then
        mv "$name".{ext1,ext2} "$THRASH" 2> /dev/null
    fi;
done

THRASH您可以通过修改变量来配置将没有配对的文件移动到何处。

在具有 3.0 GHz 和 2 GB RAM 的双核Pentium上，一次运行需要 63.7 秒（10000 对，文件夹中缺少该对的每个成员中的大约 1500 个）。

score 0 · Accepted Answer

Python should be faster; however if you want to try in bash:

for file in $(ls | cut -d. -f1 | sort -u); do
    if [ $(ls $file.* | wc -l) -ne 2 ]; then
        echo "too much extension for $file"
    fi
done

This should display filenames with more or less than two extensions.

score 0 · Accepted Answer

试试这个：

#!/bin/bash

for file in *.ext1 *.ext2
do
  #name is the substring before the '.'
  name=${file%.*}
  #ext is the substring after the '.'
  ext=${file#*.}
  case $ext in
    "ext1")
      sibling="$name.ext2";
      #does it haves a sibling?
      #if it does not,remove the file
      ls | grep $sibling >/dev/null;
      if [ $? -ne 0 ]
      then
        rm $file
      fi;;
    "ext2")
      sibling="$name.ext1";
      #does it haves a sibling?
      #if it does not,remove the file
      ls | grep $sibling >/dev/null;
      if [ $? -ne 0 ]
      then
        rm $file
      fi;;
  esac      
done

linux - 如何查找所有没有相同名称但扩展名不同的匹配文件的文件

3 回答 3

Related

Reference