shell - 为什么 uniq -c 输出带有空格而不是 \t？

Question

我使用 uniq -c 一些文本文件。它的输出是这样的：

123(space)first word(tab)other things
  2(space)second word(tab)other things

……

所以我需要提取总数（比如上面的 123 和 2），但我不知道怎么做，因为如果我用空格分割这一行，它会像这样['123', 'first', 'word(tab)other', 'things']。我想知道为什么它不使用标签输出？

以及如何提取shell中的总数？（我终于用python提取了，WTF）

更新：对不起，我没有正确描述我的问题。我不想将总数相加，我只想将（空格）替换为（制表符），但这不会影响单词中的空格，因为我仍然需要后面的数据。像这样：

123(tab)first word(tab)other things
  2(tab)second word(tab)other things

score 8 · Accepted Answer

8

试试这个：

uniq -c | sed -r 's/^( *[^ ]+) +/\1\t/'

于 2012-07-26T13:49:37.977 回答

score 8 · Accepted Answer

尝试：

uniq -c text.file | sed -e 's/ *//' -e 's/ /\t/'

这将删除行数之前的空格，然后只用制表符替换第一个空格。

要用制表符替换所有空格，请使用 tr：

uniq -c text.file | tr ' ' '\t'

要用单个选项卡替换所有连续运行的选项卡，请使用 -s：

uniq -c text.file | tr -s ' ' '\t'

score 1 · Accepted Answer

1

您可以使用以下方法对所有数字求和awk：

awk '{s+=$1}END{print s}'

于 2012-07-26T13:37:56.753 回答

score 0 · Accepted Answer

0

$ cat <file> | uniq -c | awk -F" " '{sum += $1} END {print sum}'

于 2012-07-26T13:38:14.420 回答

score 0 · Accepted Answer

在计数后获取选项卡的一种可能解决方案是编写一个uniq -c类似于您想要的格式的脚本。这是一个快速尝试（这似乎通过了我一分钟左右的测试）：

awk '
(NR == 1) || ($0 != lastLine) {
    if (NR != 1) {
        printf("%d\t%s\n", count, lastLine);
    }
    lastLine = $0;
    count = 1;
    next;
}
{
    count++;
}
END {
    printf("%d\t%s\n", count, lastLine);
}
' yourFile.txt

score 0 · Accepted Answer

另一种解决方案。这相当于早期的sed解决方案，但它确实awk按请求/标记使用！

cat yourFile.txt \
    | uniq -c \
    | awk '{
        match($0, /^ *[^ ]* /);
        printf("%s\t%s\n", $1, substr($0, RLENGTH + 1));
      }'

score 0 · Accepted Answer

根据William Pursell 的回答，如果您喜欢 Perl 兼容的正则表达式 (PCRE)，也许更优雅和现代的方式是

perl -pe 's/ *(\d+) /$1\t/'

选项是执行 ( -e) 和打印 ( -p)。

shell - 为什么 uniq -c 输出带有空格而不是 \t？

7 回答 7

Related

Reference