linux - 使用 shell 对行的条目进行排序

Question

考虑以下输入和输出：

  infile   |   outfile
1 3 5 2 4  |  1 2 3 4 5
2 4 5      |  2 4 5
4 6 2 1    |  1 2 4 6

是否有任何不涉及编程语言的 UNIX 程序组合——除了 shell 脚本本身——对文件每一行中的条目进行排序比以下方法更快：

while read line; do
    tr ' ' '\n' <<< ${line} | sort | tr '\n' ' '
    echo ""
done < infile > outfile

我的意思是，我可以创建一个小cpp/python/awk/...程序来做到这一点，但这与使用通常的单行代码来神奇地解决问题不同。

编辑：

我一定是加了太多文字，而不是简单地问我想要什么；直截了当，我想确认是否有任何 UNIX 程序/程序组合（使用管道、fors、whiles ......）能够对一行中的条目进行排序，但没有上述一种解决方案那样的开销。

我知道我可能会在 perl、awk、python 之类的编程语言中完成令人讨厌的工作，但我实际上是在寻找不涉及这些语言解释器的 UNIX 程序组合。从答案中，我必须得出结论，没有这样inline sort的工具，我非常感谢我所拥有的解决方案——主要是非常简洁的 Perl 单行。

然而，我并不真正理解我发布的 Bash 方法开销如此之大的原因。这真的是由于大量的上下文切换，还是仅仅是来回转换输入并对其进行排序的开销？

我似乎无法理解这些步骤中的哪一个会大大减慢执行速度。对大约 500k 行的文件中的条目进行排序需要几分钟，每行有大约 30 个值。

score 2 · Accepted Answer

Perl can do this nicely as a one-line Unix/Linux command:

perl -n -e "print join ' ', sort{a<=>b} split ' '" < input.txt > output.txt

This is "archaic" Perl with no dollars before the a and b, which allows the command to run fine in both Windows and bash shells. If you use the dollars with bash, they must either be escaped with backslashes, or you must invert the single and double quotes.

Note that the distinctions you are trying to draw between commands, programming languages, and programs are pretty thin. Bash is a programming language. Perl can certainly be used as a shell. Both are commands.

The reason your script runs slowly is that it spawns 3 processes per loop iteration. Process creation is pretty expensive.

score 1 · Accepted Answer

#!awk -f
{
  baz = 0
  PROCINFO["sorted_in"] = "@val_num_asc"
  split($0, foo)
  for (bar in foo)
    $++baz = foo[bar]
}
1

结果

1 2 3 4 5
2 4 5
1 2 4 6

score 1 · Accepted Answer

这个问题比看起来更微妙。您似乎在询问是否有更快的方法来执行排序，并且您使用 Perl 和 awk 等获得了很多（优雅的！）答案。但是您的问题似乎是您是否可以使用 shell built-ins进行更快的排序，为此，答案是否定的。

显然，sort 不是内置的 shell，tr 也不是。没有内置函数可以执行 sort 的功能，并且可能替代“tr”的内置函数在这里不太可能对您有所帮助（例如，删除 bash 的 IFS 变量需要做很多工作对 tr 的调用只是与 tr 一起生活）。

就个人而言，我会选择 Perl。请注意，如果您的数据集很大或很奇怪，您可以选择使用 sort pragma 更改 Perls 默认排序算法。我不认为您将需要它来对整数文件进行排序，但也许这只是您的一个说明。

score 0 · Accepted Answer

它不漂亮（绝对不是 1-liner），但您可以仅使用内置 shell 命令对行进行排序，但是对于短行，它可能比重复调用外部函数更快。

#!/bin/sh
sortline(){
for x in $@;do
    [ ! "$FIRST" ] && FIRST=t && set --
    i=0
    while [ $i -le $# ];do
        [ $x -lt $((${@:$((i+1)):1})) ] && break || i=$((i+1))
    done
    set -- ${@:1:$i}  $x   ${@:$((i+1)):$(($#-$i))}
done
echo $@
}
while read LINE || [ "$LINE" ];do
    sortline $LINE
done <$1 >$2

编辑：顺便说一句，这是一个选择排序算法，以防有人想知道

Edit2：这仅适用于数值，对于字符串，您需要使用一些比较，例如[ "$x" -lt "${@:$((i+1)):1}" ]（未选中），但是我将此 C 程序用于字符串（我只是将其称为 qsort），但可以在 argv 上使用 atoi 对其进行修改：

#include <stdlib.h>
#include <string.h>
static inline int cmp(const void *a, const void *b){
   return strcmp(*(const char **)a, *(const char **)b);
}

int main(int argc, char *argv[]){
    qsort(++argv, --argc, sizeof(char *), cmp);
    while (argc){
      write(1,argv[0],strlen(argv[0]));
      write(1,(--argc && argv++)?"\t":"\n",1);
   }
}

linux - 使用 shell 对行的条目进行排序

4 回答 4

Related

Reference