python - 使用 python 解析具有异常分隔符的文本文件

Question

在支持遗留系统时，我面临着一个以以下格式存储数据的现场数据收集器：

# This is a comment <-beacuse it starts at the begining of the file
# This is a comment <- see above
# 1. Item one <- not a comment because it starts with 1.
# Description of Item 1 <- not a comment as it is after a line that starts with a number
data point 1
data point 2
data point etc
3 <-- represents number of data points under Item one

# 2. Item two <-- not a comment
# Description of item 2 <-- not a comment
data point 1
data point ..
data point 100
100
#3. Item three <--- not a comment
# Item three description
0

我不确定解析该文件以将每个项目包含为自己的列表的正确方法是什么。请注意，有时但并非总是数据会在两个不同项目之间添加随机空间。

解析此类文件的正确方法是什么？

score 1 · Accepted Answer

I would do this in three steps:

Remove all comments from the start of the file
Split on a regular expression to find all the other comments in the file (see here for an example of how to split using a regular expression)
Parse the remaining lines

score 1 · Accepted Answer

You could use REGEX and do a split by: ^(?=\# ?\d+\.)

Explained example here: http://regex101.com/r/gB3xD1

python - 使用 python 解析具有异常分隔符的文本文件

2 回答 2

Related

Reference