5

我有一个文件,其中包含:

VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoiceMailConfig60CharsTest
VoicemailDefaultTypeTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoicemailSettingsFromMessageModeScreenTest
VoicemailSettingsFromMessageModeScreenTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest

如何用计数替换重复的行:

VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)

我将这对放入关联数组中。我尝试在“while”语句中使用“read”,但数组丢失了。这是我的尝试:

unset line
tests=$(cat file.log)
echo "$tests" | 
    while read l; do 
        if [ "$l" == "${line}" ]; then
            let cnt++;
        else
            echo "${line} (${cnt})"
            line=${l}
            cnt=1
        fi
        export run_suites
    done
4

6 回答 6

5

假设输出的格式不必完全匹配

VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)

你可以使用

sort <input_file> | uniq -c

如果您需要输出与您发布的内容完全匹配,您可以使用

awk '{duplicates[$1]++} END{for (ind in duplicates) {print ind,"("duplicates[ind]")"}}' <input_file>

编辑:在 anubhava 的回答之后发布......但由于添加了 sort 命令而离开(除非人们建议我删除)。

于 2017-10-31T18:44:30.037 回答
3

您可以使用这个简单的 awk 脚本来获取计数:

awk '{freq[$1]++} END{for (i in freq) print i, "(" freq[i] ")"}' file

VoiceMailConfig60CharsTest (1)
VoicemailSettingsFromMessageModeScreenTest (2)
VoiceMailIconSelectableTest (5)
VoicemailButtonTest (5)
VoicemailDefaultTypeTest (1)
VoicemailSettingsTest (7)

如果要保持输入中出现的顺序,请使用:

awk '!freq[$1]++{order[++k]=$1} END{
    for (i=1; i<=k; i++) print order[i], "(" freq[order[i]] ")"}' file

VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)
于 2017-10-31T18:37:32.127 回答
3

如果您不关心确切的输出格式,只需使用sortand uniq

$ sort file.log | uniq -c
5 VoicemailButtonTest
1 VoiceMailConfig60CharsTest
1 VoicemailDefaultTypeTest
5 VoiceMailIconSelectableTest
2 VoicemailSettingsFromMessageModeScreenTest
7 VoicemailSettingsTest

sort,当然,如果文件已经按照您的问题排序,则没有必要。如果它没有被排序,uniq -c它仍然可以工作,但是如果它与前一行相同,它只会认为该行是重复的:

$ printf 'a\nb\na' | uniq -c
1 a
1 b
1 a
于 2017-10-31T18:44:42.610 回答
1
$ awk '$1 != prev{if (NR>1) print prev, "("cnt")"; prev=$1; cnt=0} {cnt++} END{print prev, "("cnt")"}' file
VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)

上面保留了您的输入顺序并且几乎没有在内存中存储任何内容,它不关心您的输入是否已排序,它只依赖于您的输入文件中连续出现的所有重复键,就像您在示例中显示的那样。

于 2017-10-31T18:58:24.510 回答
0

使用 bash 数组

unset tab
declare -A tab
while read line;do
  let tab["$line"]=${tab["$line"]}+1
done < infile
for i in ${!tab[*]} ;do
  echo "$i  (${tab[$i]})"
done | sort
于 2017-10-31T23:45:32.537 回答
0

无需awk根据第一次出现保持键的顺序,并且不需要排序或分组输入。

cat -n file    |     # add line numbers for order
sort -k2       |     # sort based on keys, ignoring line no
uniq -f1 -c    |     # count keys, ignoring line no
sort -k2,2n    |     # sort by line no to recover initial order
sed -r 's/(\S+)\s+(\S+)\s+(\S+)/\3 (\1)/'     # format output
于 2017-10-31T19:36:44.980 回答