1

我在尝试DeviceId从与以下格式类似的日志文件中获取唯一出现的 时遇到一些问题:

log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"123"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}

我期待的是这样的输出:

log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}

我尝试使用awk,但我似乎可以弄清楚。有谁知道如何做到这一点?

我知道应该有一种方法来打印DeviceId使用awk,但我似乎无法弄清楚。一旦我得到了,DeviceId我就可以通过管道连接到sortand uniq

4

6 回答 6

4

使用 Perl:

perl -lne 'if ( m{"DeviceId":" ([^"]+) "}xms ) { print if not $seen{$1}++; }' <log
于 2013-02-25T18:02:24.993 回答
4

使用 GNU awk:

gawk 'match($0, /DeviceId":"([^"]+)/, a) && seen[a[1]]++ == 0' log

给定您的输入,此输出

log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}

注意,这本质上是@Perleone 的答案的笨拙翻译,虽然我当时没有注意到

于 2013-02-25T18:57:53.333 回答
1

根据@cnicutar 的回答,sed使用sortcut

sed 's/.*\"DeviceId":"\([0-9]*\).*/\1\t\0/' <file> | sort -u -k 1,1 | cut -f 2

输出:

log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}
于 2013-02-25T17:56:19.637 回答
1

唯一设备 ID 使用awk

$ awk '/DeviceId/&&!a[$1]++&&gsub(/[^[:digit:]]/,"")' RS='[{,}]' file
123
234
323

好处awk是关联数组,不需要管道到sort -u.

于 2013-02-25T18:28:22.277 回答
1

使用任何 awk:

$ awk '{id=$0;gsub(/.*DeviceId":"|".*/,"",id)} !seen[id]++' file
log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}
于 2013-02-25T23:41:39.950 回答
1

更好地解析 JSON(但另一个快速 awk):

awk -F'.*DeviceId":"|["}]' '!A[$2]++' file 

应用 Ed Morton 的建议再剃掉 3 个字符:

awk -F'.*DeviceId":"|"' '!A[$2]++' file 
于 2013-02-25T23:50:57.580 回答