bash - 用于更改 .csv 文件中日期格式的 bash 脚本

Question

我目前每周处理一批 -50plus csv 文件，其时间戳显示为 Tue Oct 01 10:59:59 PDT 2013。我需要能够逐行浏览并将格式更改为 10/01/13 10:59:59。有些文件将时间戳作为第一个字符串，有些文件将时间戳记在第三个字符串中。我运气不好...

这是两个 csv 文件的片段。

1.csv

Tue Oct 01 10:59:59 PDT 2013,data1,1,Databcd,Dataxyz,0,0,431,0

Tue Oct 01 11:59:59 PDT 2013,data1,1,Databcd,Dataxyz,0,0,401,0

2.csv

data1,0,Databcd,0,0,0,Tue Oct 01 11:59:59 PDT 2013,Dataxyz

data1,0,Databcd,0,0,0,Tue Oct 01 12:59:59 PDT 2013,Dataxyz

提前致谢 -

这是我上次运行的脚本..

#!/bin/bash

for f in $*
do
echo "Processing [$f]..."

ftemp=$f.TMP
    #echo "ftemp=$ftemp"
#this uses sed to delete the day(word) frm the timestamp.
sed -e 's/Mon //g' <$f >$ftemp
mv $ftemp $f   #copy it back over the original
sed -e 's/Tue //g' <$f >$ftemp
mv $ftemp $f   #copy it back over the original 
sed -e 's/Wed //g' <$f >$ftemp
mv $ftemp $f   #copy it back over the original
sed -e 's/Thu //g' <$f >$ftemp
mv $ftemp $f   #copy it back over the original
sed -e 's/Fri //g' <$f >$ftemp
mv $ftemp $f   #copy it back over the original
sed -e 's/Sat //g' <$f >$ftemp
mv $ftemp $f   #copy it back over the original
sed -e 's/Sun //g' <$f >$ftemp
mv $ftemp $f   #copy it back over the original

#strip out the PDT & Year from end of each line 
sed -e 's/\ PDT / /g' -e 's/\ PST / /g' <$f >$ftemp
mv $ftemp $f   #copy it back over the original
sed --date="Oct 01 00:59:59 2013" +%D <$f >$ftemp
mv $ftemp $f   #copy it back over the original
#echo "10/01/2013" |    sed -E 's/([a-z ]?)\/([0-9][0-9 ]?)\/([0-9][0-9][0-9][0-9]
#/\3-\2-\1/' <$f >$ftemp
#   tr  'Oct' '10/' <$f >$ftemp
#   mv $ftemp $f   #copy it back over the original
done

echo "Done."

如您所见，我有一些我尝试过的选项被注释掉了

score 1 · Accepted Answer

这是使用的尝试sed：

sed -i.bak -r -e 's,[[:alpha:]]{3}\s+([[:alpha:]]{3})\s+([0-9]{2})\s+([0-9]{2}:[0-9]{2}:[0-9]{2})\s+[A-Z]{3}\s+[0-9]{2}([0-9]{2}),\1/\2/\4 \3,g' -e 's/Jan/01/; s/Feb/02/; s/Mar/03/; s/Apr/04/; s/May/05/; s/Jun/06/; s/Jul/07/; s/Aug/08/; s/Sep/09/; s/Oct/10/; s/Nov/11/; s/Dec/12/;' *.csv

在您的示例输入上为我工作。

score 0 · Accepted Answer

你可能想要awk。

此脚本查看每个字段，并尝试将其更改为所需格式的日期。然后，它将其更改回原始格式，以验证它是否与原始格式匹配。如果原始与刚刚创建的匹配，我们替换该字段并打印。

如果 TZ 未设置为 CSV 文件的内容，您可能无法匹配时区。

#!/bin/awk -f
BEGIN { FS = ","; OFS="," }
{
    # print
    for (i=1; i<=NF; i++)
    {
        cmd = "date -d '" $i "' +'%D %T' 2> /dev/null"
        # print cmd
        if ( ( cmd | getline result ) > 0 )
        {
             # print $i, result
             cmd = "date -d '" result "' +'%a %b %d %T %Z %Y'"
             if ( ( cmd | getline revert ) > 0 )
             {
                 # print $i, result, revert
                 if ( $i == revert )
                 {
                     # print "Changing " $i " to " result
                     $i = result
                 }
             }
             # print $i
             # print ""
        }
    }
    print
}

score 0 · Accepted Answer

This might work for you (GNU sed):

sed -ri '1{x;s/^/Jan01Feb02Mar03Apr04May05Jun06Jul07Aug08Sep09Oct10Nov11Dec12/;x};G;s/... (...) (..) (..:..:..) PDT ..(..)(.*)\n.*\1(..).*/\6\/\2\/\4 \3\5/;s/\n.*//' file

bash - 用于更改 .csv 文件中日期格式的 bash 脚本

3 回答 3

Related

Reference