python - 散列多个文件

Question

问题规范：

给定一个目录，我想遍历该目录及其非隐藏子目录，
并将漩涡哈希添加到非隐藏文件的名称中。
如果重新运行脚本，它将用新的哈希替换旧的哈希。

<filename>.<extension> ==> <filename>.<a-whirlpool-hash>.<extension>

<filename>.<old-hash>.<extension> ==> <filename>.<new-hash>.<extension>

问题：

a) 你会怎么做？

b) 在您可用的所有方法中，您的方法最适合的是什么？

判决：

谢谢大家，我选择了 SeigeX 的答案是因为它的速度和便携性。
它在经验上比其他 bash 变体更快，
并且它在我的 Mac OS X 机器上无需改动即可工作。

score 6 · Accepted Answer

更新以修复：
1. 名称中带有“[”或“]”的文件名（实际上，现在可以使用任何字符。请参阅评论）
2. 对名称中带有反斜杠或换行符的文件进行哈希处理时 md5sum 的处理
3. 函数化哈希- 模块化检查算法
4. 重构哈希检查逻辑以消除双重否定

#!/bin/bash
if (($# != 1)) || ! [[ -d "$1" ]]; then
    echo "Usage: $0 /path/to/directory"
    exit 1
fi

is_hash() {
 md5=${1##*.} # strip prefix
 [[ "$md5" == *[^[:xdigit:]]* || ${#md5} -lt 32 ]] && echo "$1" || echo "${1%.*}"
}

while IFS= read -r -d $'\0' file; do
    read hash junk < <(md5sum "$file")
    basename="${file##*/}"
    dirname="${file%/*}"
    pre_ext="${basename%.*}"
    ext="${basename:${#pre_ext}}"

    # File already hashed?
    pre_ext=$(is_hash "$pre_ext")
    ext=$(is_hash "$ext")

    mv "$file" "${dirname}/${pre_ext}.${hash}${ext}" 2> /dev/null

done < <(find "$1" -path "*/.*" -prune -o \( -type f -print0 \))

到目前为止，此代码与其他条目相比具有以下优点

它完全符合 Bash 2.0.2 及更高版本
无需对其他二进制文件（如 sed 或 grep）进行多余的调用；改为使用内置参数扩展
使用进程替换来代替“查找”而不是管道，这种方式没有子外壳
将要处理的目录作为参数并对其进行完整性检查
使用 $() 而不是 `` 符号进行命令替换，后者已被弃用
适用于带空格的文件
适用于带有换行符的文件
适用于具有多个扩展名的文件
适用于没有扩展名的文件
不遍历隐藏目录
不跳过预先散列的文件，它将根据规范重新计算散列

测试树

$树-aa
一种
|-- .hidden_dir
| `-- 富
|-- 乙
| `--光盘
| |-- f
| |-- g.5236b1ab46088005ed3554940390c8a7.ext
| |-- h.d41d8cd98f00b204e9800998ecf8427e
| |-- i.ext1.5236b1ab46088005ed3554940390c8a7.ext2
| `-- j.ext1.ext2
|-- c.ext^Mnewline
| |-- f
| `-- g.with[or].ext
`-- f^Jnewline.ext

4个目录，9个文件

结果

$树-aa
一种
|-- .hidden_dir
| `-- 富
|-- 乙
| `--光盘
| |-- f.d41d8cd98f00b204e9800998ecf8427e
| |-- g.d41d8cd98f00b204e9800998ecf8427e.ext
| |-- h.d41d8cd98f00b204e9800998ecf8427e
| |-- i.ext1.d41d8cd98f00b204e9800998ecf8427e.ext2
| `-- j.ext1.d41d8cd98f00b204e9800998ecf8427e.ext2
|-- c.ext^Mnewline
| |-- f.d41d8cd98f00b204e9800998ecf8427e
| `-- g.with[or].d41d8cd98f00b204e9800998ecf8427e.ext
`-- f^Jnewline.d3b07384d113edec49eaa6238ad5ff00.ext

4个目录，9个文件

score 4 · Accepted Answer

#!/bin/bash
find -type f -print0 | while read -d $'\0' file
do
    md5sum=`md5sum "${file}" | sed -r 's/ .*//'`
    filename=`echo "${file}" | sed -r 's/\.[^./]*$//'`
    extension="${file:${#filename}}"
    filename=`echo "${filename}" | sed -r 's/\.md5sum-[^.]+//'`
    if [[ "${file}" != "${filename}.md5sum-${md5sum}${extension}" ]]; then
        echo "Handling file: ${file}"
        mv "${file}" "${filename}.md5sum-${md5sum}${extension}"
    fi
done

在包含诸如“a b”之类的空格的文件上进行了测试
在包含多个扩展名（如“abc”）的文件上进行测试
使用包含空格和/或点的目录进行测试。
对包含点的目录中不包含扩展名的文件进行测试，例如“ab/c”
更新：如果文件更改，现在更新哈希。

关键点：

使用print0piped towhile read -d $'\0'正确处理文件名中的空格。
md5sum 可以替换为您喜欢的哈希函数。sed 从 md5sum 的输出中删除第一个空格和后面的所有内容。
基本文件名是使用正则表达式提取的，该表达式找到最后一个不跟另一个斜杠的句点（因此目录名称中的句点不计入扩展名的一部分）。
通过使用以起始索引作为基本文件名长度的子字符串来找到扩展名。

score 3 · Accepted Answer

需求的逻辑足够复杂，足以证明使用 Python 而不是 bash 是合理的。它应该提供一个更具可读性、可扩展性和可维护性的解决方案。

#!/usr/bin/env python
import hashlib, os

def ishash(h, size):
    """Whether `h` looks like hash's hex digest."""
    if len(h) == size: 
        try:
            int(h, 16) # whether h is a hex number
            return True
        except ValueError:
            return False

for root, dirs, files in os.walk("."):
    dirs[:] = [d for d in dirs if not d.startswith(".")] # skip hidden dirs
    for path in (os.path.join(root, f) for f in files if not f.startswith(".")):
        suffix = hash_ = "." + hashlib.md5(open(path).read()).hexdigest()
        hashsize = len(hash_) - 1
        # extract old hash from the name; add/replace the hash if needed
        barepath, ext = os.path.splitext(path) # ext may be empty
        if not ishash(ext[1:], hashsize):
            suffix += ext # add original extension
            barepath, oldhash = os.path.splitext(barepath) 
            if not ishash(oldhash[1:], hashsize):
               suffix = oldhash + suffix # preserve 2nd (not a hash) extension
        else: # ext looks like a hash
            oldhash = ext
        if hash_ != oldhash: # replace old hash by new one
           os.rename(path, barepath+suffix)

这是一个测试目录树。它包含：

名称中带有点的目录中没有扩展名的文件
已包含哈希的文件名（幂等性测试）
带有两个扩展名的文件名
名称中的换行符

$树一
一种
|-- 乙
| `--光盘
| |-- f
| |-- f.ext1.ext2
| `--g.d41d8cd98f00b204e9800998ecf8427e
|-- c.ext^Mnewline
| `--f
`-- f^Jnewline.ext1

7个目录，5个文件

结果

$树一
一种
|-- 乙
| `--光盘
| |-- f.0bee89b07a248e27c83fc3d5951213c1
| |-- f.ext1.614dd0e977becb4c6f7fa99e64549b12.ext2
| `--g.d41d8cd98f00b204e9800998ecf8427e
|-- c.ext^Mnewline
| `-- f.0bee89b07a248e27c83fc3d5951213c1
`-- f^Jnewline.b6fe8bb902ca1b80aaa632b776d77f83.ext1

7个目录，5个文件

该解决方案适用于所有情况。

Whirlpool hash 不在 Python 的 stdlib 中，但有纯 Python 和 C 扩展都支持它，例如python-mhash.

要安装它：

$ sudo apt-get install python-mhash

要使用它：

import mhash

print mhash.MHASH(mhash.MHASH_WHIRLPOOL, "text to hash here").hexdigest()

输出：cbdca4520cc5c131fc3a86109dd23fee2d7ff7be56636d398180178378944a4f41480b938608ae98da7eccbf39a4c79b83a8590c4cb1bace5bc638fc92b3e653

`whirlpooldeep`在 Python 中调用

from subprocess import PIPE, STDOUT, Popen

def getoutput(cmd):
    return Popen(cmd, stdout=PIPE, stderr=STDOUT).communicate()[0]

hash_ = getoutput(["whirlpooldeep", "-q", path]).rstrip()

git可以为需要根据哈希值跟踪文件集的问题提供杠杆作用。

score 3 · Accepted Answer

我对我的第一个答案并不满意，因为正如我在那里所说，这个问题看起来最好用 perl 解决。您已经在问题的一个编辑中说过，您在要运行它的 OS X 机器上有 perl，所以我试了一下。

在 bash 中很难做到这一点，即避免使用奇怪的文件名出现任何引用问题，并且在使用极端情况下的文件名时表现得很好。

所以这里是 perl，一个完整的解决你的问题的方法。它运行在其命令行上列出的所有文件/目录。


#!/usr/bin/perl -w
# whirlpool-rename.pl
# 2009 Peter Cordes <peter@cordes.ca>.  Share and Enjoy!

use Fcntl;      # for O_BINARY
use File::Find;
use Digest::Whirlpool;

# find callback, called once per directory entry
# $_ is the base name of the file, and we are chdired to that directory.
sub whirlpool_rename {
    print "find: $_\n";
#    my @components = split /\.(?:[[:xdigit:]]{128})?/; # remove .hash while we're at it
    my @components = split /\.(?!\.|$)/, $_, -1; # -1 to not leave out trailing dots

    if (!$components[0] && $_ ne ".") { # hidden file/directory
        $File::Find::prune = 1;
        return;
    }

    # don't follow symlinks or process non-regular-files
    return if (-l $_ || ! -f _);

    my $digest;
    eval {
        sysopen(my $fh, $_, O_RDONLY | O_BINARY) or die "$!";
        $digest = Digest->new( 'Whirlpool' )->addfile($fh);
    };
    if ($@) {  # exception-catching structure from whirlpoolsum, distributed with Digest::Whirlpool.
        warn "whirlpool: couldn't hash $_: $!\n";
        return;
    }

    # strip old hashes from the name.  not done during split only in the interests of readability
    @components = grep { !/^[[:xdigit:]]{128}$/ }  @components;
    if ($#components == 0) {
        push @components, $digest->hexdigest;
    } else {
        my $ext = pop @components;
        push @components, $digest->hexdigest, $ext;
    }

    my $newname = join('.', @components);
    return if $_ eq $newname;
    print "rename  $_ ->  $newname\n";
    if (-e $newname) {
        warn "whirlpool: clobbering $newname\n";
        # maybe unlink $_ and return if $_ is older than $newname?
        # But you'd better check that $newname has the right contents then...
    }
    # This could be link instead of rename, but then you'd have to handle directories, and you can't make hardlinks across filesystems
    rename $_, $newname or warn "whirlpool: couldn't rename $_ -> $newname:  $!\n";
}


#main
$ARGV[0] = "." if !@ARGV;  # default to current directory
find({wanted => \&whirlpool_rename, no_chdir => 0}, @ARGV );

优点： - 实际使用了whirlpool，所以可以直接使用这个确切的程序。（安装 libperl-digest-whirlpool 之后）。易于更改为您想要的任何摘要功能，因为您拥有 perl Digest 通用接口，而不是具有不同输出格式的不同程序。

实现所有其他要求：忽略隐藏文件（以及隐藏目录下的文件）。
能够处理任何可能的文件名而不会出现错误或安全问题。（有几个人在他们的 shell 脚本中做到了这一点）。
遵循遍历目录树的最佳实践，通过 chdiring 向下进入每个目录（如我之前的回答，使用 find -execdir）。这避免了 PATH_MAX 的问题，以及在您运行时重命名目录的问题。
巧妙地处理以 . 结尾的文件名。foo..txt... -> foo..hash.txt...
处理已经包含哈希的旧文件名，而不重命名它们，然后重命名它们。（它会去除任何被“.”字符包围的 128 个十六进制数字序列。）在一切正确的情况下，不会发生磁盘写入活动，只是读取每个文件。您当前的解决方案在已经正确命名的情况下运行 mv 两次，导致目录元数据写入。而且速度较慢，因为这是必须执行的两个过程。
高效的。没有程序是分叉/执行的，而大多数实际可行的解决方案最终不得不为每个文件添加一些东西。Digest::Whirlpool 是用一个本地编译的共享库实现的，所以它不是慢的纯 perl。这应该比在每个文件上运行程序更快，尤其是。对于小文件。
Perl 支持 UTF-8 字符串，因此带有非 ascii 字符的文件名应该不是问题。（不确定 UTF-8 中的任何多字节序列是否可以单独包含表示 ASCII '.' 的字节。如果可能，那么您需要 UTF-8 感知字符串处理。sed 不知道 UTF-8 . Bash 的 glob 表达式可能。）
易于扩展。当你把它放到一个真正的程序中，并且你想处理更多的极端情况时，你可以很容易地做到这一点。例如，当您想要重命名文件但哈希命名的文件名已经存在时，决定要做什么。
良好的错误报告。但是，大多数 shell 脚本通过传递它们运行的 progs 中的错误来实现这一点。

score 2 · Accepted Answer

find . -type f -print | while read file
do
    hash=`$hashcommand "$file"`
    filename=${file%.*}
    extension=${file##*.}
    mv $file "$filename.$hash.$extension"
done

score 1 · Accepted Answer

您可能希望将结果存储在一个文件中，例如

find . -type f -exec md5sum {} \; > MD5SUMS

如果您真的想要每个哈希一个文件：

find . -type f | while read f; do g=`md5sum $f` > $f.md5; done

甚至

find . -type f | while read f; do g=`md5sum $f | awk '{print $1}'`; echo "$g $f"> $f-$g.md5; done

score 1 · Accepted Answer

在 sh 或 bash 中，有两个版本。一个限制为带有扩展名的文件......

hash () {
  #openssl md5 t.sh | sed -e 's/.* //'
  whirlpool "$f"
}

find . -type f -a -name '*.*' | while read f; do
  # remove the echo to run this for real
  echo mv "$f" "${f%.*}.whirlpool-`hash "$f"`.${f##*.}"
done

测试...

...
mv ./bash-4.0/signames.h ./bash-4.0/signames.whirlpool-d71b117a822394a5b273ea6c0e3f4dc045b1098326d39864564f1046ab7bd9296d5533894626288265a1f70638ee3ecce1f6a22739b389ff7cb1fa48c76fa166.h
...

而这个更复杂的版本处理所有普通文件，有或没有扩展名，有或没有空格和奇数字符等......

hash () {
  #openssl md5 t.sh | sed -e 's/.* //'
  whirlpool "$f"
}

find . -type f | while read f; do
  name=${f##*/}
  case "$name" in
    *.*) extension=".${name##*.}" ;;
    *)   extension=   ;;
  esac
  # remove the echo to run this for real
  echo mv "$f" "${f%/*}/${name%.*}.whirlpool-`hash "$f"`$extension"
done

score 1 · Accepted Answer

这是我对它的看法，在 bash 中。特点：跳过非常规文件；正确处理名称中带有奇怪字符（即空格）的文件；处理无扩展名的文件名；跳过已经散列的文件，因此它可以重复运行（尽管如果文件在运行之间被修改，它会添加新的散列而不是替换旧的散列）。我使用 md5 -q 作为哈希函数编写它；你应该可以用其他任何东西替换它，只要它只输出散列，而不是像文件名 => 散列这样的东西。

find -x . -type f -print0 | while IFS="" read -r -d $'\000' file; do
    hash="$(md5 -q "$file")" # replace with your favorite hash function
    [[ "$file" == *."$hash" ]] && continue # skip files that already end in their hash
    dirname="$(dirname "$file")"
    basename="$(basename "$file")"
    base="${basename%.*}"
    [[ "$base" == *."$hash" ]] && continue # skip files that already end in hash + extension
    if [[ "$basename" == "$base" ]]; then
            extension=""
    else
            extension=".${basename##*.}"
    fi
    mv "$file" "$dirname/$base.$hash$extension"
done

score 1 · Accepted Answer

漩涡不是很常见的哈希。您可能必须安装一个程序来计算它。例如，Debian/Ubuntu 包含一个“whirlpool”包。该程序自己打印一个文件的哈希值。apt-cache search whirlpool 显示其他一些软件包支持它，包括有趣的 md5deep。

一些早期的 anwsers 将在文件名中包含空格时失败。如果是这种情况，但您的文件的文件名中没有任何换行符，那么您可以安全地使用 \n 作为分隔符。


oldifs="$IFS"
IFS="
"
for i in $(find -type f); do echo "$i";done
#output
# ./base
# ./base2
# ./normal.ext
# ./trick.e "xt
# ./foo bar.dir ext/trick' (name "- }$foo.ext{}.ext2
IFS="$oldifs"

尝试不设置 IFS 以了解其重要性。

我打算用 IFS="."; 查找-print0 | while read -a 数组，在 "." 上拆分字符，但我通常从不使用数组变量。我在手册页中看到没有简单的方法来插入哈希作为倒数第二个数组索引，并向下推最后一个元素（文件扩展名，如果有的话。）任何时候 bash 数组变量看起来很有趣，我知道是时候做我在 perl 中所做的事情了！请参阅使用阅读的陷阱：http: //tldp.org/LDP/abs/html/gotchas.html#BADREAD0

我决定使用我喜欢的另一种技术：find -exec sh -c。这是最安全的，因为您不解析文件名。

这应该可以解决问题：


find -regextype posix-extended -type f -not -regex '.*\.[a-fA-F0-9]{128}.*'  \
-execdir bash -c 'for i in "${@#./}";do 
 hash=$(whirlpool "$i");
 ext=".${i##*.}"; base="${i%.*}";
 [ "$base" = "$i" ] && ext="";
 newname="$base.$hash$ext";
 echo "ext:$ext  $i -> $newname";
 false mv --no-clobber "$i" "$newname";done' \
dummy {} +
# take out the "false" before the mv, and optionally take out the echo.
# false ignores its arguments, so it's there so you can
# run this to see what will happen without actually renaming your files.

-execdir bash -c 'cmd' dummy {} + 在那里有虚拟参数，因为命令后的第一个参数在 shell 的位置参数中变为 $0，而不是 for 循环的“$@”的一部分。我使用 execdir 而不是 exec 所以我不必处理目录名称（或者当实际文件名都足够短时，对于具有长名称的嵌套目录可能会超过 PATH_MAX。）

-not -regex 防止它被两次应用于同一个文件。虽然 whirlpool 是一个非常长的哈希，并且 mv 如果我在没有检查的情况下运行它两次，它会说文件名太长。（在 XFS 文件系统上。）

没有扩展名的文件得到 basename.hash。我必须特别检查以避免附加尾随 .，或将基本名称作为扩展名。${@#./} 去掉 find 放在每个文件名前面的前导 ./，所以没有“。” 在没有扩展名的文件的整个字符串中。

mv --no-clobber 可能是 GNU 扩展。如果您没有 GNU mv，如果您想避免删除现有文件，请执行其他操作（例如，您运行一次，一些相同的文件以其旧名称添加到目录中；您再次运行它。） OTOH，如果你想要那种行为，就把它拿出来。

即使文件名包含换行符（它们可以，你知道！）或任何其他可能的字符，我的解决方案也应该有效。在 perl 中它会更快更容易，但是您要求使用 shell。

wallenborn 使用所有校验和（而不是重命名原始文件）制作一个文件的解决方案非常好，但效率低下。不要对每个文件运行一次 md5sum，一次在尽可能多的文件上运行它，以使其适合其命令行：

查找 dir -type f -print0 | xargs -0 md5sum > dir.md5 或使用 GNU find，xargs 是内置的（注意 + 而不是 ';'） find dir -type f -exec md5sum {} + > dir.md5

如果你只使用 find -print | xargs -d'\n'，你会被带引号的文件名搞砸，所以要小心。如果您不知道有一天可能会在哪些文件上运行此脚本，请始终尝试使用 print0 或 -exec。这是特别的。如果文件名由不受信任的用户提供（即可能是您服务器上的攻击媒介），则为 true。

score 1 · Accepted Answer

嗯，有趣的问题。

尝试以下操作（mktest 函数仅用于测试——用于 bash 的 TDD！:)

编辑：

添加了对漩涡哈希的支持。
代码清理
更好地引用文件名
更改了测试部分的数组语法——现在应该可以与大多数类似 korn 的 shell 一起使用。请注意，pdksh 不支持基于 : 的参数扩展（或者更确切地说，它意味着其他东西）

另请注意，在 md5 模式下，对于具有类似漩涡的哈希的文件名，它会失败，反之亦然。

#!/usr/bin/env bash

#测试：
# GNU bash，版本 4.0.28(1)-release (x86_64-pc-linux-gnu)
# ksh (AT&T Research) 93s+ 2008-01-31
# mksh @(#)MIRBSD KSH R39 2009/08/01 Debian 39.1-4
# 不适用于 pdksh，破折号

DEFAULT_SUM="md5"

#带一个参数，作为根路径
# 以及一个可选参数，要使用的散列函数（md5 或 wp for whirlpool）。
主要的（）
{
  案例 2 美元
    “wp”）
      出口总和=“wp”
      ;;
    “md5”）
      导出 SUM="md5"
      ;;
    *)
      导出 SUM=$DEFAULT_SUM
      ;;
  经社理事会

  # 对于所有可见子文件夹中的所有可见文件，移动文件
  # 到包含正确哈希的名称：
  find $1 -type f -not -regex '.*/\..*' -exec $0 hashmove '{}' \;
}

# 给定一个以 $1 命名的完整路径文件，计算它的哈希值。
# 输出文件名，在扩展之前插入哈希
# (如果有的话) -- 或者：用新的哈希替换现有的哈希，
# 如果哈希已经存在。
hashname_md5()
{
  路径名="$1"
  full_hash=`md5sum "$路径名"`
  哈希=${full_hash:0:32}
  文件名=`基本名称“$路径名”`
  前缀=${文件名%%.*}
  后缀=${文件名#$prefix}

  #如果后缀以看起来像 md5sum 的东西开头，
  ＃去掉它：
  suffix=`echo $suffix|sed -r 's/\.[a-z0-9]{32}//'`

  echo "$prefix.$hash$suffix"
}

# 与 hashname_md5 相同——但使用漩涡哈希。
hashname_wp()
{
  路径名="$1"
  哈希=`漩涡“$路径名”`
  文件名=`基本名称“$路径名”`
  前缀=${文件名%%.*}
  后缀=${文件名#$prefix}

  #如果后缀以看起来像 md5sum 的东西开头，
  ＃去掉它：
  suffix=`echo $suffix|sed -r 's/\.[a-z0-9]{128}//'`

  echo "$prefix.$hash$suffix"
}


#给定文件路径 $1，将其移动/重命名为包含文件哈希的名称。
# 尝试替换现有的哈希，如果没有更新则不移动文件
#需要。
哈希移动（）
{
  路径名="$1"
  文件名=`基本名称“$路径名”`
  path="${pathname%%/$filename}"

  案例 $SUM 在
    “wp”）
      hashname=`hashname_wp "$pathname"`
      ;;
    “md5”）
      hashname=`hashname_md5 "$pathname"`
      ;;
    *)
      echo "请求了未知的哈希"
      1号出口
      ;;
  经社理事会

  如果 [[ "$filename" != "$hashname" ]]
  然后
      echo "重命名：$pathname => $path/$hashname"
      mv "$pathname" "$path/$hashname"
  别的
    echo "$pathname 已更新"
  菲
}

# 在/tmp下创建som testdata
测试（）
{
  root_dir=$(临时文件)
  rm "$root_dir"
  mkdir "$root_dir"
  我=0
  test_files[$((i++))]='test'
  test_files[$((i++))]='testfile，无扩展名或空格'

  test_files[$((i++))]='.hidden'
  test_files[$((i++))]='一个隐藏文件'

  test_files[$((i++))]='测试空间'
  test_files[$((i++))]='testfile，无扩展名，名称中有空格'

  test_files[$((i++))]='test.txt'
  test_files[$((i++))]='testfile，扩展名，名称中没有空格'

  test_files[$((i++))]='test.ab8e460eac3599549cfaa23a848635aa.txt'
  test_files[$((i++))]='testfile，使用（错误的）md5sum，名称中没有空格'

  test_files[$((i++))]='测试间隔.ab8e460eac3599549cfaa23a848635aa.txt'
  test_files[$((i++))]='testfile，带有（错误的）md5sum，名称中有空格'

  test_files[$((i++))]='test.8072ec03e95a26bb07d6e163c93593283fee032db7265a29e2430004eefda22ce096be3fa189e8988c6ad77a3154af76f582d7e84e3f319b798d3693d.txta693
  test_files[$((i++))]='testfile，带有（错误的）whirlpoolhash，名称中没有空格'

  test_files[$((i++))]='test spaced.8072ec03e95a26bb07d6e163c93593283fee032db7265a29e2430004eefda22ce096be3fa189e8988c6ad77a3154af76f582d7e84e3f319b798d363'c]
  test_files[$((i++))]='testfile，带有（错误的）whirlpoolhash，名称中有空格'

  test_files[$((i++))]='测试空间.txt'
  test_files[$((i++))]='testfile, 扩展名, 名字中的空格'

  test_files[$((i++))]='测试多空间.txt'
  test_files[$((i++))]='testfile，扩展名，名字中有多个连续的空格'

  test_files[$((i++))]='测试空间.h'
  test_files[$((i++))]='testfile，短扩展名，名称中的空格'

  test_files[$((i++))]='测试空间.reallylong'
  test_files[$((i++))]='testfile，长扩展名，名称中的空格'

  test_files[$((i++))]='测试空间.reallyreallyreallylong.tst'
  test_files[$((i++))]='testfile, 长扩展, 双扩展,
                        可能看起来像哈希，名称中的空格'

  test_files[$((i++))]='utf8test1 - æeiaæå.txt'
  test_files[$((i++))]='testfile, 扩展名, utf8 字符, 名字中的空格'

  test_files[$((i++))]='utf8test1 - 汉字.txt'
  test_files[$((i++))]='testfile，扩展名，日文utf8字符，名称中的空格'

  对于 s in 。sub1 sub2 sub1/sub3 .hidden_dir
  做

     #note -p 不需要，因为我们自上而下创建目录
     #失败“。” -- 但是这个 hack 允许我们使用单个循环
     #用于在所有目录中创建测试数据
     mkdir $root_dir/$s
     目录=$root_dir/$s

     我=0
     而 [[ $i -lt ${#test_files[*]} ]]
     做
       文件名=${test_files[$((i++))]}
       回声 ${test_files[$((i++))]} > "$dir/$filename"
     完毕
   完毕

   回声“$root_dir”
}

# 运行测试，给定一个哈希类型作为第一个参数
运行测试（）
{
  总和=$1

  root_dir=$(mktest)

  echo "创建的目录：$root_dir"
  echo "使用哈希类型 $sum 运行第一个测试："
  回声
  主要 $root_dir $sum
  回声
  echo "运行第二个测试："
  回声
  主要 $root_dir $sum
  echo "正在更新所有文件："

  查找 $root_dir -type f | 读 f
  做
    回显“更多内容”>>“$f”
  完毕

  回声
  echo "运行最终测试："
  回声
  主要 $root_dir $sum
  ＃清理：
  rm -r $root_dir
}

# 在生成的数据上测试 md5 和漩涡哈希。
运行测试（）
{
  运行测试 md5
  运行测试 wp
}

#For为了能够递归调用脚本，不拆分
# 分隔文件的函数：
案例“$1”在
  '测试'）
    运行测试
  ;;
  '哈希名'）
    哈希名“$2”
  ;;
  '哈希移动'）
    哈希移动“$ 2”
  ;;
  '跑'）
    主要“$2”“$3”
  ;;
  *)
    echo "与：$0 测试一起使用 - 或者如果您只想在文件夹上尝试："
    echo "$0 运行路径 (隐含 md5)"
    echo "$0 运行 md5 路径"
    echo "$0 运行 wp 路径"
  ;;
经社理事会

score 1 · Accepted Answer

针对您更新的问题：

如果有人可以评论我如何避免使用我的 BASH 脚本查看隐藏目录，将不胜感激。

您可以使用 find 避免隐藏目录

find -name '.?*' -prune -o \( -type f -print0 \)

-name '.*' -prune 将修剪“。”，并停止而不做任何事情。：/

不过，我仍然推荐我的 Perl 版本。我更新了它...不过，您可能仍需要从 CPAN 安装 Digest::Whirlpool。

score 0 · Accepted Answer

使用 zsh：

$ ls
a.txt
b.txt
c.txt

魔法：

$ FILES=**/*(.) 
$ # */ stupid syntax coloring thinks this is a comment
$ for f in $FILES; do hash=`md5sum $f | cut -f1 -d" "`; mv $f "$f:r.$hash.$f:e"; done
$ ls
a.60b725f10c9c85c70d97880dfe8191b3.txt
b.3b5d5c3712955042212316173ccf37be.txt
c.2cd6ee2c70b0bde53fbe6cac3c8b8bb1.txt

解构快乐！

mv编辑：在子目录中添加文件并在参数周围加上引号

score 0 · Accepted Answer

红宝石：

#!/usr/bin/env ruby
require 'digest/md5'

Dir.glob('**/*') do |f|
  next unless File.file? f
  next if /\.md5sum-[0-9a-f]{32}/ =~ f
  md5sum = Digest::MD5.file f
  newname = "%s/%s.md5sum-%s%s" %
    [File.dirname(f), File.basename(f,'.*'), md5sum, File.extname(f)]
  File.rename f, newname
end

处理包含空格、没有扩展名并且已经过哈希处理的文件名。

忽略隐藏的文件和目录——如果需要，添加File::FNM_DOTMATCH为第二个参数。glob

python - 散列多个文件

问题规范：

问题：

a) 你会怎么做？

b) 在您可用的所有方法中，您的方法最适合的是什么？

判决：

13 回答 13

测试树

结果

结果

whirlpooldeep在 Python 中调用

Related

Reference

`whirlpooldeep`在 Python 中调用