0

我正在尝试更改 XML 中的所有日期值,然后从时间戳中添加或减去用户指定的时间量。

时间戳都是格式 2016-06-29T17:03:39.000Z 但是,它们并不都包含在相同的标签中

我的 XML 看起来像这样:

<Id>2016-06-29T17:03:37.000Z</Id>
<Lap StartTime="2016-06-29T17:03:37.000Z">
<TotalTimeSeconds>6906</TotalTimeSeconds>
<DistanceMeters>60870.5</DistanceMeters>
<Intensity>Active</Intensity>
<TriggerMethod>Manual</TriggerMethod>
<Track>
<Trackpoint>
<Time>2016-06-29T17:03:37.000Z</Time>

我想逐行浏览 XML 文件,并搜索日期/时间字符串,然后首先查找并替换日期,然后从时间戳中添加/减去一些时间。

到目前为止,这是我的代码:

import re
import xml.etree.ElementTree as et

name_file = 'test.txt' 
fh = open(name_file, "r")
filedata = fh.read()
fh.close()

filedata = filedata.split()
for line  in filedata:
    cur_date = re.findall('\d{4}[-/]\d{2}[-/]\d{2}', line)
    print cur_date

有谁知道如何做到这一点?

4

4 回答 4

0

使用此正则表达式查找所有日期:

\d{4}[-/]\d{2}[-/]\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z

filedata = filedata.split()
for line  in filedata:
    cur_date = re.findall('\d{4}[-/]\d{2[-/]\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z', line)
    print cur_date
    for match in cur_date
        line.replace(match,updateDate(match))

您只需要创建一个 updateDate() 函数来执行您想要的更新。在此函数中,您可以使用相同的正则表达式,但这次使用匹配的组,例如 ()。

我认为将工作分成两部分更容易

于 2016-06-30T14:15:03.243 回答
0

我终于用下面的代码解决了这个问题(它可能不是 100% 最优的,但它可以工作..):

import re
import xml.etree.ElementTree as et
import datetime

name_file = 'test.gpx' #raw_input("Naam van file incl .txt op het einde: ")
nieuwe_datum = '2016-06-30' #raw_input("Nieuwe datum format YYYY-MM-DD: ")
new_start_time = '14:45:00' #raw_input("Start tijdstip format hh:mm:ss : ")
new_start_time = datetime.datetime.strptime(new_start_time, "%H:%M:%S")
fh = open(name_file, "r")
filedata = fh.read()
fh.close()
outfile = open('output.gpx', 'w')

time_list = list()

filedata = filedata.split()
for line  in filedata:
    cur_date = re.findall('\d{4}[-/]\d{2}[-/]\d{2}', line)
    for match1 in cur_date:
        line = line.replace(match1, nieuwe_datum)
    cur_time = re.findall('\d{2}:\d{2}:\d{2}.\d{3}', line)
    for match in cur_time:
    time_list.append(match)
cur_start_time = min(time_list)
print 'current start time: '
print cur_start_time
print 'new start time: '
print new_start_time
cur_start_time = datetime.datetime.strptime(cur_start_time, "%H:%M:%S.%f")
if cur_start_time > new_start_time:
    time_dif = (cur_start_time - new_start_time)
    print 'time difference is: ' 
    print time_dif
    for line in filedata:
        cur_time = re.findall('\d{2}:\d{2}:\d{2}.\d{3}', line)
        for match2 in cur_time:
            new_time = datetime.datetime.strptime(match2, "%H:%M:%S.%f")
            new_time = new_time - time_dif
            new_time = re.findall('\d{2}:\d{2}:\d{2}', str(new_time))
            line = line.replace(match2, new_time[0])
        line = line + "\n"
        outfile.write(line) 
        #print line 
else:
    time_dif = new_start_time - cur_start_time
    print 'time difference is: '
    print time_dif
    for line in filedata:
        cur_time = re.findall('\d{2}:\d{2}:\d{2}.\d{3}', line)
        for match2 in cur_time:
            new_time = datetime.datetime.strptime(match2, "%H:%M:%S.%f")
            new_time = new_time + time_dif
            new_time = re.findall('\d{2}:\d{2}:\d{2}', str(new_time))
            line = line.replace(match2, new_time[0])
        line = line + "\n"
        outfile.write(line) 
        #print line 
print 'Nieuwe start datum is: '
print nieuwe_datum
outfile.close()
于 2016-07-05T07:12:49.140 回答
0

假设在这种情况下我们可以忽略时间戳嵌入在 XML 中,您可以使用以下方法调整它们re.sub()

#!/usr/bin/env python2
import datetime as DT
import fileinput
import re

timestamp_regex = '(\d{4})-(\d{2})-(\d{2})T(\d{2}):(\d{2}):(\d{2}).(\d{3})Z'

def add_two_days(m):
    numbers = map(int, m.groups())
    numbers[-1] *= 1000  # milliseconds -> microseconds
    try:
        utc_time = DT.datetime(*numbers)
    except ValueError:
        return m.group(0) # leave an invalid timestamp as is
    else:
        utc_time += DT.timedelta(days=2) # add 2 days
        return utc_time.strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + 'Z'

replace_time = re.compile(timestamp_regex).sub
for line in fileinput.input('test.xml', backup='.bak', inplace=1, bufsize=-1):
    print replace_time(add_two_days, line),

为了更轻松地处理时间戳,它们被转换为datetime对象。timedelta()您可以使用此处调整时间。

fileinput.input(inplace=1)就地更改输入文件(print在这种情况下打印到文件)。备份文件被复制到具有相同名称和附加.bak文件扩展名的文件中。请参阅如何使用 Python 搜索和替换文件中的文本?

于 2016-07-01T11:50:41.620 回答
0

你可以使用这个:

(?P<YEAR>[\d]{4})-(?P<MONTH>([0][1-9])|([1][0-2]))-(?P<DAY>([0][1-9])|([12][0-9])|([3][01]))T(?P<HOUR>([01][0-9])|([2][0-3])):(?P<MINUTES>([0-5][0-9])):(?P<SECONDS>([0-5][0-9])).(?P<MILLIS>[0-9]{3})Z

然后您可以像这样访问命名组:

cur_date.group('YEAR')

PS你可以在这里看到现场演示:https ://regex101.com/r/mA1rY4/1

于 2016-06-30T14:12:08.833 回答