regex - Linux 工具 - 如何计算和列出文件中正则表达式的出现次数

Question

我有一个包含大量类似字符串的文件。我想计算正则表达式的唯一出现次数，并显示它们是什么，例如Profile: (\w*)文件上的模式：

Profile: blah
Profile: another
Profile: trees
Profile: blah

我想找到有 3 次出现，并返回结果：

blah, another, trees

score 6 · Accepted Answer

尝试这个：

egrep "Profile: (\w*)" test.text -o | sed 's/Profile: \(\w*\)/\1/g' | sort | uniq

输出：

another
blah
trees

描述

egrepwith-o选项将在文件中获取匹配的模式。

sed只会获取捕获部分

sort后面uniq会给出一个独特元素的列表

要获取结果列表中的元素数量，请在命令后面附加wc -l

egrep "Profile: (\w*)" test.text -o | sed 's/Profile: \(\w*\)/\1/g' | sort | uniq | wc -l

输出：

score 1 · Accepted Answer

awk '{a[$2]}END{for(x in a)print x}' file

将适用于您的示例

kent$  echo "Profile: blah
Profile: another
Profile: trees
Profile: blah"|awk '{a[$2]}END{for(x in a)print x}'
another
trees
blah

如果你想在输出中有计数（3）：

awk '{a[$2]}END{print "count:",length(a);for(x in a)print x }' file

同样的例子：

kent$  echo "Profile: blah
Profile: another
Profile: trees
Profile: blah"|awk '{a[$2]}END{print "count:",length(a);for(x in a)print x }'
count: 3
another
trees
blah

regex - Linux 工具 - 如何计算和列出文件中正则表达式的出现次数

2 回答 2

Related

Reference