bash - 是否有计算文件的 bash 命令？

Question

是否有一个 bash 命令可以计算与模式匹配的文件数？

例如，我想获取目录中与此模式匹配的所有文件的计数：log*

score 318 · Accepted Answer

这个简单的单行代码应该可以在任何 shell 中工作，而不仅仅是 bash：

ls -1q log* | wc -l

ls -1q 将为每个文件提供一行，即使它们包含空格或特殊字符（例如换行符）。

输出通过管道传送到 wc -l，它计算行数。

score 69 · Accepted Answer

这里有很多答案，但有些没有考虑到

包含空格、换行符或控制字符的文件名
以连字符开头的文件名（想象一个名为的文件-l）
以点开头的隐藏文件（如果 glob*.log不是log*
与 glob 匹配的目录（例如，名为 match 的logs目录log*）
空目录（即结果为 0）
非常大的目录（将它们全部列出可能会耗尽内存）

这是一个处理所有这些的解决方案：

ls 2>/dev/null -Ubad1 -- log* | wc -l

解释：

-U导致ls不对条目进行排序，这意味着它不需要将整个目录列表加载到内存中
-b打印非图形字符的 C 样式转义，关键是导致换行符打印为\n.
-a打印出所有文件，甚至是隐藏文件（当 globlog*表示没有隐藏文件时，不严格需要）
-d打印目录而不尝试列出目录的内容，这是ls通常会做的
-1确保它在一列上（ls 在写入管道时会自动执行此操作，因此并非绝对必要）
2>/dev/null重定向标准错误，以便如果有 0 个日志文件，则忽略错误消息。（请注意，这shopt -s nullglob会导致ls列出整个工作目录。）
wc -l在生成目录列表时使用它，因此输出ls永远不会在任何时间点存储在内存中。
--文件名与使用命令分开，--以免被理解为参数ls（以防log*被删除）

shell将扩展log*为完整的文件列表，如果文件很多，可能会耗尽内存，因此通过 grep 运行它会更好：

ls -Uba1 | grep ^log | wc -l

最后一个处理非常大的文件目录而不使用大量内存（尽管它确实使用了子shell）。-d不再需要，因为它只列出当前目录的内容。

score 66 · Accepted Answer

\n您可以使用 bash安全地执行此操作（即不会被带有空格或名称的文件所干扰）：

$ shopt -s nullglob
$ logfiles=(*.log)
$ echo ${#logfiles[@]}

您需要启用nullglob，这样如果没有文件匹配，您就不会*.log在$logfiles 数组中获得文字。（有关如何安全重置它的示例，请参见如何“撤消”'set -x'？）

score 59 · Accepted Answer

对于递归搜索：

find . -type f -name '*.log' -printf x | wc -c

wc -c将计算的输出中的字符数find，同时-printf x告诉为每个结果find打印一个x。

对于非递归搜索，请执行以下操作：

find . -maxdepth 1 -type f -name '*.log' -printf x | wc -c

score 11 · Accepted Answer

这个问题的公认答案是错误的，但我的代表很低，所以无法添加评论。

Mat给出了这个问题的正确答案：

shopt -s nullglob
logfiles=(*.log)
echo ${#logfiles[@]}

接受的答案的问题是 wc -l 计算换行符的数量，即使它们打印到终端作为'？在“ls -l”的输出中。这意味着当文件名包含换行符时，接受的答案失败。我已经测试了建议的命令：

ls -l log* | wc -l

即使只有 1 个文件与名称恰好包含换行符的模式匹配，它也会错误地报告值 2。例如：

touch log$'\n'def
ls log* -l | wc -l

score 7 · Accepted Answer

如果你有很多文件并且不想使用优雅shopt -s nullglob和bash数组的解决方案，你可以使用find等，只要你不打印出文件名（可能包含换行符）。

find -maxdepth 1 -name "log*" -not -name ".*" -printf '%i\n' | wc -l

这将找到与 log* 匹配且不以开头的所有文件.*— "not name.*" 是多余的，但重要的是要注意 "ls" 的默认值是不显示点文件，但默认值因为 find 是包含它们。

这是一个正确的答案，可以处理任何类型的文件名，因为文件名永远不会在命令之间传递。

但是，shopt nullglob答案是最好的答案！

score 7 · Accepted Answer

这是我的一个班轮。

 file_count=$( shopt -s nullglob ; set -- $directory_to_search_inside/* ; echo $#)

score 7 · Accepted Answer

一个重要的评论

（没有足够的声誉发表评论）

这是错误的：

ls -1q some_pattern | wc -l

如果shopt -s nullglob碰巧设置了，它会打印所有常规文件的数量，而不仅仅是带有模式的文件（在 CentOS-8 和 Cygwin 上测试）。谁知道还有什么其他无意义ls的错误？

这是正确的，而且速度更快：

shopt -s nullglob; files=(some_pattern); echo ${#files[@]};

它完成了预期的工作。

并且运行时间不同。
第一个：0.006在 CentOS 和0.083Cygwin 上（以防小心使用）。
第二个：0.000在 CentOS 和0.003Cygwin 上。

score 5 · Accepted Answer

您可以使用 -R 选项来查找文件以及递归目录中的文件

ls -R | wc -l // to find all the files

ls -R | grep log | wc -l // to find the files which contains the word log

您可以在 grep 上使用模式

score 2 · Accepted Answer

您可以使用 shell 函数轻松定义这样的命令。此方法不需要任何外部程序，也不会产生任何子进程。它不会尝试进行危险的ls解析，并且可以很好地处理“特殊”字符（空格、换行符、反斜杠等）。它只依赖于 shell 提供的文件名扩展机制。它至少与 sh、bash 和 zsh 兼容。

下面的行定义了一个调用的函数count，它打印调用它的参数的数量。

count() { echo $#; }

只需使用所需的模式调用它：

count log*

为了在 globbing 模式不匹配时结果正确，必须在扩展发生时设置shell 选项nullglob（或- 这是 zsh 上的默认行为）。failglob可以这样设置：

shopt -s nullglob    # for sh / bash
setopt nullglob      # for zsh

根据您要计算的内容，您可能还对 shell option 感兴趣dotglob。

不幸的是，至少使用 bash，在本地设置这些选项并不容易。如果您不想全局设置它们，最直接的解决方案是以这种更复杂的方式使用该函数：

( shopt -s nullglob ; shopt -u failglob ; count log* )

如果你想恢复轻量级语法count log*，或者如果你真的想避免产生一个子shell，你可以破解一些类似的东西：

# sh / bash:
# the alias is expanded before the globbing pattern, so we
# can set required options before the globbing gets expanded,
# and restore them afterwards.
count() {
    eval "$_count_saved_shopts"
    unset _count_saved_shopts
    echo $#
}
alias count='
    _count_saved_shopts="$(shopt -p nullglob failglob)"
    shopt -s nullglob
    shopt -u failglob
    count'

作为奖励，此功能具有更普遍的用途。例如：

count a* b*          # count files which match either a* or b*
count $(jobs -ps)    # count stopped jobs (sh / bash)

通过将函数转换为可从 PATH 调用的脚本文件（或等效的 C 程序），它还可以与诸如find和之类的程序组成xargs：

find "$FIND_OPTIONS" -exec count {} \+    # count results of a search

score 2 · Accepted Answer

我对这个答案深思熟虑，特别是考虑到don't-parse-ls 的东西。起初，我尝试

<警告！没有工作>

du --inodes --files0-from=<(find . -maxdepth 1 -type f -print0) | awk '{sum+=int($1)}END{print sum}'

</警告！没有工作>

如果只有一个像这样的文件名，它会起作用

touch $'w\nlf.aa'

但是如果我创建了这样的文件名失败了

touch $'firstline\n3 and some other\n1\n2\texciting\n86stuff.jpg'

我终于想出了我在下面放的东西。注意我试图获取目录中所有文件的计数（不包括任何子目录）。我认为它以及@Mat 和@Dan_Yard 的答案，以及至少具有@mogsie 提出的大部分要求（我不确定内存。）我认为@mogsie 的答案是正确的，但我总是尽量避免解析ls，除非这是一个非常具体的情况。

awk -F"\0" '{print NF-1}' < <(find . -maxdepth 1 -type f -print0) | awk '{sum+=$1}END{print sum}'

更具可读性：

awk -F"\0" '{print NF-1}' < \
  <(find . -maxdepth 1 -type f -print0) | \
    awk '{sum+=$1}END{print sum}'

这是专门针对文件进行查找，用空字符分隔输出（以避免空格和换行问题），然后计算空字符的数量。文件的数量将比空字符的数量少一，因为最后会有一个空字符。

要回答OP的问题，需要考虑两种情况

1）非递归搜索：

awk -F"\0" '{print NF-1}' < \
  <(find . -maxdepth 1 -type f -name "log*" -print0) | \
    awk '{sum+=$1}END{print sum}'

2）递归搜索。请注意，-name参数内部的内容可能需要针对稍微不同的行为（隐藏文件等）进行更改。

awk -F"\0" '{print NF-1}' < \
  <(find . -type f -name "log*" -print0) | \
    awk '{sum+=$1}END{print sum}'

如果有人想评论这些答案与我在这个答案中提到的答案的比较，请这样做。

请注意，我在得到这个答案时进入了这个思考过程。

score 1 · Accepted Answer

1

这就是我一直在做的事情：

ls 日志* | awk 'END{打印 NR}'

于 2019-01-17T22:11:06.470 回答

score 1 · Accepted Answer

这是您可以在脚本中使用的通用 Bash 函数。

    # @see https://stackoverflow.com/a/11307382/430062
    function countFiles {
        shopt -s nullglob
        logfiles=($1)
        echo ${#logfiles[@]}
    }

    FILES_COUNT=$(countFiles "$file-*")

score 0 · Accepted Answer

ls -1 log* | wc -l

这意味着每行列出一个文件，然后将其传递给字数命令，并将参数切换到计数行。

score 0 · Accepted Answer

这可以使用标准 POSIX shell 语法来完成。

这是一个简单的count_entries函数：

#!/usr/bin/env sh

count_entries()
{
  # Emulating Bash nullglob 
  # If argument 1 is not an existing entry
  if [ ! -e "$1" ]
    # argument is a returned pattern
    # then shift it out
    then shift
  fi
  echo $#
}

对于紧凑的定义：

count_entries(){ [ ! -e "$1" ]&&shift;echo $#;}

按类型推荐的 POSIX 兼容文件计数器：

#!/usr/bin/env sh

count_files()
# Count the file arguments matching the file operator
# Synopsys:
# count_files operator FILE [...]
# Arguments:
# $1: The file operator
#   Allowed values:
#   -a FILE    True if file exists.
#   -b FILE    True if file is block special.
#   -c FILE    True if file is character special.
#   -d FILE    True if file is a directory.
#   -e FILE    True if file exists.
#   -f FILE    True if file exists and is a regular file.
#   -g FILE    True if file is set-group-id.
#   -h FILE    True if file is a symbolic link.
#   -L FILE    True if file is a symbolic link.
#   -k FILE    True if file has its `sticky' bit set.
#   -p FILE    True if file is a named pipe.
#   -r FILE    True if file is readable by you.
#   -s FILE    True if file exists and is not empty.
#   -S FILE    True if file is a socket.
#   -t FD      True if FD is opened on a terminal.
#   -u FILE    True if the file is set-user-id.
#   -w FILE    True if the file is writable by you.
#   -x FILE    True if the file is executable by you.
#   -O FILE    True if the file is effectively owned by you.
#   -G FILE    True if the file is effectively owned by your group.
#   -N FILE    True if the file has been modified since it was last read.
# $@: The files arguments
# Output:
#   The number of matching files
# Return:
#   1: Unknown file operator
{
  operator=$1
  shift
  case $operator in
    -[abcdefghLkprsStuwxOGN])
      for arg; do
        # If file is not of required type
        if ! test "$operator" "$arg"; then
          # Shift it out
          shift
        fi
      done
      echo $#
      ;;
    *)
      printf 'Invalid file operator: %s\n' "$operator" >&2
      return 1
      ;;
  esac
}

count_files "$@"

示例用法：

count_files -f log*.txt
count_files -d datadir*

score -1 · Accepted Answer

要计算所有内容，只需将 ls 管道传输到字数统计行：

ls | wc -l

要使用模式计数，请先通过管道传输到 grep：

ls | grep log | wc -l

bash - 是否有计算文件的 bash 命令？

16 回答 16

一个重要的评论

Related

Reference