bash - awk - 检查时间码

Question

在我的文件中，每一行看起来像这样：

\18:49:25 1920/11/29\ 0.25

当我将其保存在变量中时，有没有一种方法可以检查时间码是否正确，以确保其中没有拼写错误（最多 12 个月，用 4 个数字写的年份等）？

\%H:%M:%S %Y/%m/%d\

谢谢

PS我知道检查不能每次都进行（我无法区分分钟和秒，但我可以检查我没有13个月等）

编辑：时间码不固定。数据文件可能如下所示：

    *1920.11.29 18.49.25* 0.25

但它也可能看起来完全不同。唯一确定的想法是我将时间码以一般形式保存在变量中。在这种情况下，它将是

*%Y.%m.%d %H.%M.%S*

EDIT2： 看起来我没有表达清楚，所以这里有一些我想要实现的例子：

1)

$ cat input.txt
29/11/1920 17:50
30/11/1920 18:20
01/12/1920 07:20
...

$ ./checktimecode "%d/%m/%Y %H:%M" input.txt

结果=时间码OK

2)

$ cat input.txt
**1920/11@17:50**
**1920/12@18:20**
**1920/13@07:20**
...

$ ./checktimecode "**%Y/%d@%H:%M**" input.txt

结果=时间码OK

$ ./checktimecode "**%Y/%m@%H:%M**" input.txt

结果=错误的时间码

3)

$ cat input.txt
!17/50/20\29/11/1920&
!18/20/50\30/11/1920&
!07/18/05\01/12/1920&
...

$ ./checktimecode "!%H/%M/%S\%d/%m/%Y&" input.txt

结果=时间码OK

$ ./checktimecode "%H/%M/%S\%d/%m/%Y&" input.txt

结果=错误的时间码

score 1 · Accepted Answer

样品日期：

$ cat input.txt

\18:00:00 1920/11/29\ OK
\18:00:00 1920/13/29\ KO
\00:61:00 1920/02/29\ KO
\25:00:00 1920/11/29\ KO
\00:00:00 1920/11/29\ OK

`awk`脚本

#!/usr/bin/awk -f

BEGIN{
    FS = "\\"
}

!check_date_time($2)

function check_date_time(dt,     a,date,time,year,mon,day,hour,min,sec)
{
    split(dt, a, " ")
    date = a[2]
    time = a[1]

    split(date, a, "/")
    year = a[1]
    mon  = a[2]
    day  = a[3]

    split(time, a, ":")
    hour = a[1]
    min  = a[2]
    sec  = a[3]

    return check_date(year, mon, day) && check_time(hour, min, sec)
}

function check_time(hour, min, sec)
{
    return 0<=hour && hour<=23 && 0<=min && min<=59 && 0<=sec && sec<=59
}

function check_date(year, mon, day)
{
    if (mon < 0 || mon >= 13)
        return 0
    else if (day == 31 && (mon == 4 ||  mon == 6 || mon == 9 || mon == 11))
        return 0;
    else if (day >= 30 && mon == 2)
        return 0;
    else if (mon == 2 && day == 29 && ! (  year % 4 == 0 && (year % 100 != 0 || year % 400 == 0)))
        return 0;
    else
        return 1;
}

输出

$ awk -f chk_date_time.awk input.txt

\18:00:00 1920/13/29\ KO
\00:61:00 1920/02/29\ KO
\25:00:00 1920/11/29\ KO

score 1 · Accepted Answer

使用 Python 的time.strptime()or datetime.strptime()：

#!/usr/bin/python2.6
from datetime import datetime
import sys
format = sys.argv[1]
file = sys.argv[2]

with open(file, 'r') as f:
    for line in f:
        try:
            datetime.strptime(line.rstrip(), format)
        except:
            print "BAD timecode"
            sys.exit(1)

print "timecode OK"

编辑：

用法：

$ ./checktimecode "%d/%m/%Y %H:%M" input.txt

score 0 · Accepted Answer

您可以根据您想要的任何规则验证 H、M、S、Y、m 和 d...

awk -F'[\\\\:/ ]' '{H=$2; M=$3; S=$4; Y=$5; m=$6; d=$7; print "=" H "=" M "=" S "=" Y "=" m "=" d "="}'

score 0 · Accepted Answer

发射器：

@(do (defvar counter (range 1)))
@(collect :vars ())
@  (bind lineno @(pop counter))
@;;; 
@;;; Here we can add numerous cases for different date formats.
@;;;
@  (try)
@    (cases)
\@hh:@mm:@ss @year/@mo/@da\ @stuff
@    (or)
*@year.@mo.@da @hh.@mm.@ss* @stuff
@    (or)
@line
@    (throw error `line @lineno: unrecognized format`)
@    (end)
@    (filter :tonumber hh mm ss year mo da)
@    (do
       (each ((n (list hh mm ss year mo da)))
         (if (null n)
           (throw 'error `line @lineno: bad number`))
         (if (< n 0)
           (throw 'error `line @lineno: negative number`)))
       (if (> hh 23)
         (throw 'error `line @lineno: hour > 23`))
       (if (> mo 12)
         (throw 'error `line @lineno: month > 13`)))
@  (catch error (err))
@    (do (format t "~a\n" err))
@  (end)
@(end)

数据：

\28:49:25 1920/11/29\ 0.25
*1920.13.29 18.49.25* 0.25
asdf *1920.28
*-1920.11.29 18.49.25* 0.25
*1920.x3.29 18.49.25* 0.25

跑：

$ txr times.txr  times.txt  | sed -e 's/^/    /'
line 1: hour > 23
line 2: month > 13
line 3: unrecognized format
line 4: negative number
line 5: bad number

耸耸肩。也许这种异常方法不是一个好主意，因为每个条目最多只能得到一个错误报告。但如果错误条目相对较少，那可能没关系。

bash - awk - 检查时间码

4 回答 4

样品日期：

awk脚本

输出

Related

Reference

`awk`脚本