bash - Bash 编码返回具有特定字符串的文件名

Question

我的脚本（在 bash 中）旨在完成这项工作：

从文件中获取开始和停止时间，file_A. 时间范围通常为 3-24 小时。
根据[start_time, stop_time]got from的这个时间窗口file_A，我需要在总共 10k 个日志文件中找到特定的文件（并且会随着实验运行而增加），每个记录大约 30 分钟。也就是说，我必须在 10k 个日志文件中找到 6-50 个日志文件。
确认正确的日志文件后，我需要打印出有趣的数据。

步骤 1) 和 3) 都可以，我已经做到了。现在，我被困在第 2 步），尤其是在两个地方：

（一个）。自日志文件命名为时间以来，如何通过名称有效地选择合适的文件。每个名为的日志文件log_201305280650意味着 2013 / May 28 / 06:50。也就是说，根据从file_A获取的时间，我需要通过名字来确认对应的日志文件，这是时间的暗示。

(b)。选择文件后，从该文件中读取时间在时间窗口内的项目（如温度、压力等）。因为每个文件记录30分钟，这意味着这个文件中的一些条目不能满足时间窗口。

例如，

从步骤 1) 开始，我的时间窗口设置为 [201305280638, 201305290308]。

从步骤 2)，我知道日志文件 (log_201305280650) 包含 201305280638 的开始时间。所以我需要读取 201305280638 以下条目的所有温度和压力。

    the log files name is log_201305280650 (= 2013 / May 28 / 06 :50)

    Time                      temperature  pressure ...
    201305280628                100,         120  ...
    201305280629                100,         120  ...

   ...              ...     ...

    201305280638                101,         121  ...
    201305280639                99,          122  ...

     ...             ...     ... 

    201305280649                101,         119  ...
    201305280650                102,         118  ...

我的假脚本如下。

get time_start from /path/file_A
get time_stop  from /path/file_A
for file in /path_to_log_files/*
do
case "$file" in
*)        
     If [[log file name within time window of (time_start, time_stop)]]; then
     loop over this file to get the entry whose time is just within (time_start, time_stop)
     read out temperature and pressure etc.
fi
esac
done

score 0 · Accepted Answer

It may be easier to use awk and the +"%s" option of the date command in stead of literal date and time. This option converts date/time to seconds from epoch (01-01-1970). The resulting number is easy to work with. After all, it's just a number. As an example I made a small bash script. First, a simulation:

#!/bin/bash

#simulation: date and time
start_dt="2013-09-22 00:00:00"
end_dt="2013-09-22 00:00:00"
start_secs=$(date -d "start_dt" +"%s")
end_secs=$(date -d "end_dt" +"%s")
#simulation: set up table (time in secs, temperature, pressure per minute)
> logfile
for ((i=$start_secs;i<$end_secs;i=i+60)); do
    echo $i $[90+$[RANDOM %20]] $[80+$[RANDOM %30]] >> logfile
done

Here's the actual script to get the user range and to print it out:

echo "Enter start of range:"
read -p "Date (YYYY-MM-DD): "sdate
read -p "Time (HH:MM:SS)  : "stime
echo "Enter end of range:"
read -p "Date (YYYY-MM-DD): "edate
read -p "Time (HH:MM:SS)  : "etime
#convert to secs
rstart=$(date -d "$sdate $stime" +"%s")
rend=$(date -d "$edate $etime" +"%s")
#print it to screen
awk -v rstart=$rstart -v rend=$rend '{if($1 >= rstart && $1 <= rend)print $0}' logfile

The awk command is very suited for this. It is fast and can handle large files. I hope this gives you ideas.

score 0 · Accepted Answer

也许类似的东西对你有用？我使用 $start 和 $end 作为 file_A 的开始和结束时间。我

 eval cat log_{$start..$end} 2> /dev/null | sort -k1 | sed -n "/$start/,/$end/p"

这假设您的日志文件格式为

time temperature pressure ...

没有标题或其他此类文本

score 0 · Accepted Answer

使用 bash 的工作相当出色。Perl 或 python 会更容易，它们都有日期/时间模块。

我花了一段时间进行通常的日期切片，这太可怕了，所以我作弊并改用文件时间戳。Bash 有一些有限的时间戳检查，这就是使用它。好的，它做了一些文件 IO，但这些都是空文件，什么鬼！

lower=201305280638
upper=201305290308
filename=log_201305280638
filedate=${filename:4}

if (( filedate == upper )) || (( filedate == lower ))
then
    echo "$filename within range"
else
    # range files
    touch -t $lower lower.$$
    touch -t $upper upper.$$

    # benchmark file
    touch -t $filedate file.$$

    if [[ file.$$ -nt $upper ]]
    then
        echo "$filename is too young"

    elif [[ file.$$ -ot $lower ]]
    then
        echo "$filename is too old"
    else
        echo "$filename is just right"
    fi

    rm lower.$$ upper.$$ file.$$
fi

-nt是“比”新的

-ot是“早于”

因此，在开始时检查是否相等。您可以对文件中的时间戳使用类似的检查（您的第二个问题）。但老实说，你不能使用 perl 或 python 吗？

bash - Bash 编码返回具有特定字符串的文件名

3 回答 3

Related

Reference