我正在为如何完全解析这一行而摸不着头脑,我在使用“(4801)”部分时遇到了问题,其他所有元素都被抓住了。
# MAIN_PROG ( 4801) Generated at 2010-01-25 06:55:00
这是我到目前为止所拥有的
from pyparsing import nums, Word, Optional, Suppress, OneOrMore, Group, Combine, ParseException
unparsed_log_data = "# MAIN_PROG ( 4801) Generated at 2010-01-25 06:55:00.007 Type: Periodic"
binary_name = "# MAIN_PROG"
pid = Literal("(" + nums + ")")
report_id = Combine(Suppress(binary_name) + pid)
year = Word(nums, max=4)
month = Word(nums, max=2)
day = Word(nums, max=2)
yearly_day = Combine(year + "-" + month + "-" + day)
clock24h = Combine(Word(nums, max=2) + ":" + Word(nums, max=2) + ":" + Word(nums, max=2) + Suppress("."))
timestamp = Combine(yearly_day + White(' ') + clock24h).setResultsName("timestamp")
time_bnf = report_id + Suppress("Generated at") + timestamp
time_bnf.searchString(unparsed_log_data)
编辑: 保罗,如果你有耐心,我将如何过滤
unparsed_log_data =
"""
# MAIN_PROG ( 4801) Generated at 2010-01-25 06:55:00
bla bla bla
multi line garbage
bla bla
Efficiency | 38 38 100 | 3497061 3497081 99 |
more garbage
"""
time_bnf = report_id + Suppress("Generated at") + timestamp
partial_report_ignore = Suppress(SkipTo("Efficiency"))
efficiency_bnf = Suppress("|") + integer.setResultsName("Efficiency") + Suppress(integer) + integer.setResultsName("EfficiencyPercent")
效率_bnf.searchString(unparsed_log_data) 和report_and_effic.searchString(unparsed_log_data) 都按预期返回数据,但如果我尝试
report_and_effic = report_bnf + partial_report_ignore + efficiency_bnf
report_and_effic.searchString(unparsed_log_data) 返回 ([], {})
EDIT2:
应该阅读代码,
partial_report_ignore = Suppress(SkipTo("Efficiency", include=True ))