linux - 目录中行数最多的文件而不是字节

Question

我正在尝试 wc -l 整个目录，然后在带有行数的回显中显示文件名。

更令我沮丧的是，该目录必须来自传递的参数。因此，不要看起来很愚蠢，有人可以先告诉我为什么一个简单wc -l $1的不给我在参数中键入的目录的行数吗？我知道我没有完全理解它。

最重要的是，如果给定的参数不是目录或有多个参数，我也需要验证。

score 6 · Accepted Answer

wc适用于文件而不是目录，因此，如果您想要目录中所有文件的字数，您可以从：

wc -l $1/*

通过各种旋转来消除总数，对其进行排序并仅提取最大的，您最终可能会得到类似的结果（为了便于阅读，拆分为多行，但应在一行中输入）：

pax> wc -l $1/* 2>/dev/null
       | grep -v ' total$'
       | sort -n -k1
       | tail -1l

2892 target_dir/big_honkin_file.txt

至于验证，您可以检查传递给脚本的参数数量，例如：

if [[ $# -ne 1 ]] ; then
    echo 'Whoa! Wrong parameteer count'
    exit 1
fi

你可以检查它是否是一个目录：

if [[ ! -d $1 ]] ; then
    echo 'Whoa!' "[$1]" 'is not a directory'
    exit 1
fi

score 2 · Accepted Answer

这是你想要的吗？

> find ./test1/ -type f|xargs wc -l
       1 ./test1/firstSession_cnaiErrorFile.txt
      77 ./test1/firstSession_cnaiReportFile.txt
   14950 ./test1/exp.txt
       1 ./test1/test1_cnaExitValue.txt
   15029 total

所以作为参数的目录应该放在这里：

find $your_complete_directory_path/ -type f|xargs wc -l

score 1 · Accepted Answer

我正在尝试 wc -l 整个目录，然后在带有行数的回显中显示文件名。

您可以find对目录执行操作并使用-exec选项来触发wc -l. 像这样的东西：

$ find ~/Temp/perl/temp/ -exec wc -l '{}' \;
wc: /Volumes/Data/jaypalsingh/Temp/perl/temp/: read: Is a directory
      11 /Volumes/Data/jaypalsingh/Temp/perl/temp//accessor1.plx
      25 /Volumes/Data/jaypalsingh/Temp/perl/temp//autoincrement.pm
      12 /Volumes/Data/jaypalsingh/Temp/perl/temp//bless1.plx
      14 /Volumes/Data/jaypalsingh/Temp/perl/temp//bless2.plx
      22 /Volumes/Data/jaypalsingh/Temp/perl/temp//classatr1.plx
      27 /Volumes/Data/jaypalsingh/Temp/perl/temp//classatr2.plx
       7 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee1.pm
      18 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee2.pm
      26 /Volumes/Data/jaypalsingh/Temp/perl/temp//employee3.pm
      12 /Volumes/Data/jaypalsingh/Temp/perl/temp//ftp.plx
      14 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit1.plx
      16 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit2.plx
      24 /Volumes/Data/jaypalsingh/Temp/perl/temp//inherit3.plx
      33 /Volumes/Data/jaypalsingh/Temp/perl/temp//persisthash.pm

score 1 · Accepted Answer

好问题！

我看到了答案。有些还不错。find ...|xrags是我最喜欢的。无论如何都可以使用find ... -exec wc -l {} +语法对其进行简化。但有一个问题。当命令行缓冲区已满时，wc -l ...会调用一个并且每次<number> total一行都是打印机。由于wc没有禁用此功能的参数，因此wc必须重新实现。用grep过滤掉这些行并不好：

所以我的完整答案是

#!/usr/bin/bash

[ $# -ne 1 ] && echo "Bad number of args">&2 && exit 1
[ ! -d "$1" ] && echo "Not dir">&2 && exit 1
find "$1" -type f -exec awk '{++n[FILENAME]}END{for(i in n) printf "%8d %s\n",n[i],i}' {} +

或者使用更少的临时空间，但在awk中使用更大的代码：

find "$1" -type f -exec awk 'function pr(){printf "%8d %s\n",n,f}FNR==1{f&&pr();n=0;f=FILENAME}{++n}END{pr()}' {} +

杂项

如果不应该为子目录调用它，那么在find-maxdepth 1之前添加。-type
它非常快。我担心它会比find ... wc +版本慢得多，但是对于包含 14770 个文件（在几个子目录中）的目录，wc版本运行 3.8 秒，awk版本运行 5.2 秒。
awk和wc以不同的方式考虑未\n结束的行。以 no 结尾的最后一行\n不计入wc。我更喜欢把它当作awk来计算。
它不打印空文件

score 0 · Accepted Answer

要在当前目录及其子目录中查找包含最多行的文件，请使用zsh：

lines() REPLY=$(wc -l < "$REPLY")
wc -l -- **/*(D.nO+lined[1])

这定义了一个lines函数，该函数将用作全局排序函数，该函数返回$REPLY文件的行数，其路径在$REPLY.

然后我们使用zsh's 递归**/*查找常规文件 ( )，用函数 ( ) 在.数字上 ( n) 反向排序 ( )，然后选择第一个文件。（包括点文件和遍历点目录）。Olines+lines[1]D

如果您不想假设文件名可能包含哪些字符（如换行符、空格...），那么使用标准实用程序执行此操作有点棘手。使用大多数 Linux 发行版上的 GNU 工具，它会更容易一些，因为它们可以处理 NUL 终止的行：

find . -type f -exec sh -c '
  for file do
    size=$(wc -c < "$file") &&
      printf "%s\0" "$size:$file"
  done' sh {} + |
  tr '\n\0' '\0\n' |
  sort -rn |
  head -n1 |
  tr '\0' '\n'

或者使用 zsh 或 GNU bash 语法：

biggest= max=-1
find . -type f -print0 |
  {
    while IFS= read -rd '' file; do
      size=$(wc -l < "$file") &&
        ((size > max)) &&
        max=$size biggest=$file
    done
    [[ -n $biggest ]] && printf '%s\n' "$max: $biggest"
  }

score 0 · Accepted Answer

这是一个适用于我在 Windows 下使用 git bash (mingw32) 的方法：

find . -type f -print0| xargs -0 wc -l

这将列出当前目录和子目录中的文件和行数。您还可以将输出定向到文本文件并在需要时将其导入 Excel：

find . -type f -print0| xargs -0 wc -l > fileListingWithLineCount.txt

linux - 目录中行数最多的文件而不是字节

6 回答 6

Related

Reference