linux - 使用 Shell 脚本在日志文件中提取具有自己时间戳的不可预测数据

Question

log.txt将如下所示，这些是具有自己时间戳（detection_time）的 ID 数据，将在此 log.txt 文件中不断更新。ID 数据将是不可预知的数字。它可能是从 0000-9999 并且相同的 ID 可能会再次出现在 log.txt 中。

log.txt我的目标是使用 shell 脚本过滤在第一次出现后 15 秒内再次出现的 ID 。谁能帮我这个？

ID = 4231
detection_time = 1595556730 
ID = 3661
detection_time = 1595556731
ID = 2654
detection_time = 1595556732
ID = 3661
detection_time = 1595556733

更清楚地说，从log.txt上面看，ID 3661 首先出现在时间 1595556731，然后在 1595556733 再次出现，这距离第一次出现仅 2 秒。所以它符合我想要在 15 秒内再次出现的 ID 的条件。我希望这个 ID 3661 被我的 shell 脚本过滤

运行 shell 脚本后的输出将是 ID = 3661

我的问题是我不知道如何在 shell 脚本中开发编程算法。

这是我尝试使用ID_new和 ID_previous变量但ID_previous=$(ID_new) detection_previous=$(detection_new)不工作的方法

input="/tmp/log.txt"
ID_previous=""
detection_previous=""
while IFS= read -r line
do
    ID_new=$(echo "$line" | grep "ID =" | awk -F " " '{print $3}')
    echo $ID_new
    detection_new=$(echo "$line" | grep "detection_time =" | awk -F " " '{print $3}')
    echo $detection_new
    ID_previous=$(ID_new)
    detection_previous=$(detection_new)
done < "$input"

EDIT log.txt实际上数据在一个集合中，包含 ID、detection_time、Age 和 Height。很抱歉一开始没有提到这个

ID = 4231
detection_time = 1595556730 
Age = 25
Height = 182
ID = 3661
detection_time = 1595556731
Age = 24
Height = 182
ID = 2654
detection_time = 1595556732
Age = 22
Height = 184    
ID = 3661
detection_time = 1595556733
Age = 27
Height = 175
ID = 3852
detection_time = 1595556734
Age = 26
Height = 156
ID = 4231
detection_time = 1595556735 
Age = 24
Height = 184

我已经尝试过 awk 解决方案。结果是 4231 3661 2654 3852 4231log.txt 中的所有 ID 正确的输出应该是4231 3661

由此，我认为 Age 和 Height 数据可能会影响 Awk 解决方案，因为它插入在 ID 和 detection_time 的焦点数据之间。

score 1 · Accepted Answer

假设日志文件中的时间戳单调增加，您只需要使用 Awk 进行一次传递。对于每个id，跟踪它报告的最新时间（使用关联数组t，其中键是id，值是最新时间戳）。如果您id再次看到相同并且时间戳之间的差异小于 15，请报告它。

为了更好地衡量，保留p我们已经报告的第二个数组，这样我们就不会报告两次。

awk '/^ID = / { id=$3; next }
    # Skip if this line is neither ID nor detection_time
    !/^detection_time = / { next }
    (id in t) && (t[id] >= $3-15) && !(p[id]) { print id; ++p[id]; next }
    { t[id] = $3 }' /tmp/log.txt

如果您真的坚持在 Bash 中本地执行此操作，我会重构您的尝试

declare -A dtime printed
while read -r field _ value
do
    case $field in
     ID) id=$value;;
     detection_time)
      if [[ dtime["$id"] -ge $((value - 15)) ]]; then
          [[ -v printed["$id"] ]] || echo "$id"
          printed["$id"]=1
      fi
      dtime["$id"]=$value ;;
    esac
done < /tmp/log.txt

请注意read -r，只要您知道可以预期多少个字段，就可以像 awk 一样轻松地在空格上分割一行。但是while read -r通常比 Awk 慢一个数量级，并且您必须同意 Awk 尝试更加简洁和优雅，并且可以移植到旧系统。

（关联数组是在 Bash 4 中引入的。）

切线，任何看起来像的东西grep 'x' | awk '{ y }'都可以重构为awk '/x/ { y }'; 另请参阅.grep

另外，请注意$(foo)尝试foo作为命令运行。为了简单地引用变量的值foo，语法是$foo（或者，可选地，，${foo}但大括号在这里不添加任何值）。通常你会想要双引号展开"$foo"；另请参阅何时在 shell 变量周围加上引号

您的脚本只会记住一个较早的事件；关联数组允许我们记住ID我们之前看到的所有值（直到我们用完内存）。

也没有什么能阻止我们在 Awk 中使用人类可读的变量名。随意替换printed并p与Bash 替代dtime方案t完全平等。

linux - 使用 Shell 脚本在日志文件中提取具有自己时间戳的不可预测数据

1 回答 1

Related

Reference