4

我正在尝试借助此代码从 Python 中的字符串中解析多个日期,

from dateutil.parser import _timelex, parser
a = "Approve my leave from first half of 12/10/2012 to second half of 20/10/2012 "
p = parser()
info = p.info
def timetoken(token):
  try:
    float(token)
    return True
  except ValueError:
    pass
  return any(f(token) for f in (info.jump,info.weekday,info.month,info.hms,info.ampm,info.pertain,info.utczone,info.tzoffset))

def timesplit(input_string):
  batch = []
  for token in _timelex(input_string):
    if timetoken(token):
      if info.jump(token):
        continue
      batch.append(token)
    else:
      if batch:
        yield " ".join(batch)
        batch = []
  if batch:
    yield " ".join(batch)

for item in timesplit(a):
  print "Found:", item
  print "Parsed:", p.parse(item)

并且代码将字符串的后半部分作为第二个日期并给我这个错误,

raise ValueError, "unknown string format"

ValueError: unknown string format

当我将“后半部分”更改为“后半部分”或“前半部分”时,一切正常。

谁能帮我解析这个字符串?

4

2 回答 2

3

您的解析器无法处理"second"找到的timesplit,如果您将fuzzy参数设置为True,它不会中断,但也不会产生任何有意义的东西。

from cStringIO import StringIO
for item in timesplit(StringIO(a)):
    print "Found:", item
    print "Parsed:", p.parse(StringIO(item),fuzzy=True)

出去:

Found: 12 10 2012
Parsed: 2012-12-10 00:00:00
Found: second
Parsed: 2013-01-11 00:00:00
Found: 20 10 2012
Parsed: 2012-10-20 00:00:00

您必须修复时间分割或处理错误:

选择1:

失去info.hmstimetoken

选择2:

from cStringIO import StringIO
for item in timesplit(StringIO(a)):
    print "Found:", item
    try:
        print "Parsed:", p.parse(StringIO(item))
    except ValueError:
        print 'Not Parsed!'

出去:

Found: 12 10 2012
Parsed: 2012-12-10 00:00:00
Found: second
Not Parsed!
Parsed: Found: 20 10 2012
Parsed: 2012-10-20 00:00:00
于 2013-01-11T13:36:29.857 回答
2

如果您只需要日期,可以使用正则表达式提取它并使用日期。

a = "Approve my leave from first half of 12/10/2012 to second half of 20/10/2012 "

import re
pattern = re.compile('\d{2}/\d{2}/\d{4}')
pattern.findall(a)
['12/10/2012', '20/10/2012']
于 2013-01-11T13:43:31.527 回答