python - 在 Python 中解析 OFX 日期时间

Question

我正在尝试在 Python 中解析 OFX 2.3 规范中指定的日期时间。我相信这是一种自定义格式，但如果它有名称，请随时告诉我。该规范声明如下：

有一种表示日期、时间和时区的格式。完整的形式是：YYYYMMDDHHMMSS.XXX [gmt offset[:tz name]]

例如，“19961005132200.124[-5:EST]”表示东部标准时间 1996 年 10 月 5 日下午 1:22 和 124 毫秒。这与格林威治标准时间 (GMT) 下午 6:22 相同。

这是我目前的尝试：

from datetime import datetime

date_str = "19961005132200.124[EST]"
date = datetime.strptime(date_str, "%Y%m%d%H%M%S.%f[%Z]")

这个部分示例到目前为止有效，但缺少 GMT 偏移部分（-5 in [-5:EST]）。我不确定如何指定最多两位数的时区偏移量。

score 1 · Accepted Answer

首先需要注意的一些事情（如评论）：

Python 内置的 strptime 在这里会遇到困难 -%z不会解析单个数字偏移小时，%Z也不会解析一些（可能）模棱两可的时区缩写。

然后，OFX 银行版本 2.3 文档（第 3.2.8.2 节日期和日期时间）给我留下了一些问题：

UTC 偏移量是可选的吗？
为什么 EST 只是一个缩写，却被称为时区？
为什么在示例中 UTC 偏移量是 -5 小时，而在 1996-10-05，美国/东部时间是 UTC-4？
指定分钟的偏移量怎么样，例如亚洲/加尔各答的 +5:30？
（有意见）为什么要重新发明轮子而不是使用像 ISO 8601 这样的常用标准？

无论如何，这是一个自定义解析器的尝试：

from datetime import datetime, timedelta, timezone
from zoneinfo import ZoneInfo

def parseOFXdatetime(s, tzinfos=None, _tz=None):
    """
    parse OFX datetime string to an aware Python datetime object.
    """
    # first, treat formats that have no UTC offset specified.
    if not '[' in s:
        # just make sure default format is satisfied by filling with zeros if needed
        s = s.ljust(14, '0') + '.000' if not '.' in s else s
        return datetime.strptime(s, "%Y%m%d%H%M%S.%f").replace(tzinfo=timezone.utc)

    # offset and tz are specified, so first get the date/time, offset and tzname components
    s, off = s.strip(']').split('[')
    off, name = off.split(':')
    s = s.ljust(14, '0') + '.000' if not '.' in s else s
    # if tzinfos are specified, map the tz name:
    if tzinfos:
        _tz = tzinfos.get(name) # this might still leave _tz as None...
    if not _tz: # ...so we derive a tz from a timedelta
        _tz = timezone(timedelta(hours=int(off)), name=name)
    return datetime.strptime(s, "%Y%m%d%H%M%S.%f").replace(tzinfo=_tz)


# some test strings

t = ["19961005132200.124[-5:EST]", "19961005132200.124", "199610051322", "19961005",
     "199610051322[-5:EST]", "19961005[-5:EST]"]

for s in t:
    print(# normal parsing
          f'{s}\n {repr(parseOFXdatetime(s))}\n'
          # parsing with tzinfo mapping supplied; abbreviation -> timezone object
          f' {repr(parseOFXdatetime(s, tzinfos={"EST": ZoneInfo("US/Eastern")}))}\n\n')

python - 在 Python 中解析 OFX 日期时间

1 回答 1

Related

Reference