awk - 试图在 jq 中重现 awk

Question

序言：企业网络工程师/架构师（非程序员）。
概要：日志从 txt 输出移动到 json
问题：没有成功尝试从正常工作的 awk 数据提取迁移到使用 jq 对 json 使用单线。
活跃的awk：

awk '
   BEGIN{ FS="\t" }
  { arr[$1 FS $2] += $3; count[$1 FS $2] += 1 }
  END{ for (key in arr) printf "%s%s%s%s%s\n", key, FS, count[key], FS, arr[key] }
' | sort -nrk 4 | head -1 | awk '{ print $1" | "$2" | "$4/60/60 }'

最终结果：使用 jq 计算 src/dst ip 地址和 dst 端口的重复条目并添加连接的累积持续时间。

示例 JSON 输入

{
  "ts": 1636xxxxx.41xxx34,
  "uid": "hex_code",
  "id.orig_h": "10.x.x.11",
  "id.orig_p": 42996,
  "id.resp_h": "10.x.x.123",
  "id.resp_p": 53,
  "proto": "udp",
  "service": "dns",
  "duration": 0.01117664844,
  "conn_state": "SF",
  "local_orig": true,
  "local_resp": true,
  "missed_bytes": 0,
  "history": "Dd",
  "orig_pkts": 1,
  "orig_ip_bytes": 71,
  "resp_pkts": 1,
  "resp_ip_bytes": 71
}
{
  "ts": 1xxxx0501.5xxx47,
  "uid": "hex_code",
  "id.orig_h": "10.x.x.11",
  "id.orig_p": 36299,
  "id.resp_h": "10.x.x.123",
  "id.resp_p": 53,
  "proto": "udp",
  "service": "dns",
  "duration": 0.00857415966797,
  "conn_state": "SF",
  "local_orig": true,
  "local_resp": true,
  "missed_bytes": 0,
  "history": "Dd",
  "orig_pkts": 1,
  "orig_ip_bytes": 74,
  "resp_pkts": 1,
  "resp_ip_bytes": 74
}

targeted jq output...
10.xxx.xxx.21 | 18.xx1.xx1.158 | 45.6606 <--time is shown cumulative duration

score 1 · Accepted Answer

据我了解您要完成的工作，这可能会朝着您的方向发展：

jq -sr '
  group_by([."id.orig_h", ."id.resp_h"])[]
  | [(first|."id.orig_h", ."id.resp_h"), (map(.duration)|add)]
  | @csv
' extract.json

说明：您的输入是一个对象流。使用-s（或--slurp）读取它们会将流转换为数组。我们将它转换group_by为一个数组数组，分隔相等的值，这里作为两个 IP 字段的数组给出。接下来，对于外部数组的每个元素（组成员），我们只构造一个包含第一个成员的两个 IP 字段的数组（这就足够了，因为所有其他成员都与这点相同），作为第三个值，我们add将每个组成员.duration字段。最后，使用将构造的数组转换为一行 CSV @csv，由于初始-r( --raw-output) 参数，它被原始打印。

注意：我将字段值"ts": 1636xxxxx.41xxx34视为对实际数字的混淆。但是，如果它是一个包含一些x字符的字符串，那么字符串文字需要被引用 ( "ts": "1636xxxxx.41xxx34") 才能成为正确的 JSON。

回答后续问题，如何过滤掉等于 0 的持续时间并按持续时间从高到低对剩余的行进行排序

jq -sr '
  group_by([."id.orig_h", ."id.resp_h"]) | map(
    [(first|."id.orig_h", ."id.resp_h"), (map(.duration)|add)]
    | select(.[2] > 0)
  )
  | sort_by(-.[2])[]
  | @csv
' extract.json

awk - 试图在 jq 中重现 awk

1 回答 1

Related

Reference