python - 在 python 中设计一个可重用的解析器

Question

我正在编写一个文件解析器，我希望能够确定它将为我返回的女巫“数据字段”。

我开始学习 python 并且仍然习惯于像 Java 程序员一样思考，所以这个问题更多的是关于如何设计我的模块，而不是关于如何具体解析文件。

根据上下文，文件的每一行都有固定数量的字符，并且每个信息都包含在特定索引之间。例如。：

XX20120101NAME1CITYA
XY20120101NAME2CITYB

在这个虚构的例子中，从索引 0 到 2 你有一个信息，从 2 到 10 有另一个信息，依此类推......

使用 Java，我通常会创建一个表示不同信息片段的枚举器，每个“存储”起始索引和结束索引。在我的解析类中，我会设计一个可用的方法来接受n 个不同的枚举。例如。：

enum FileInformation {
    INFO01(0,2), INFO02(2,10), INFO03(10,15), INFO04(15,20);
    int startIndex;
    int endIndex;

    public FileInformation(int si, int ei)  {
        this.startIndex = si;
        this.endIndex = ei;
    }

    public int getStartIndex() { return si; }
    public int getEndIndex() { return ei; }
}

public Whatever parse(FileInformation... infos) {
    // Here I would iterate through infos[], 
    // using its start and end index to retrieve only what I need.
}

我知道我可能不应该在python中使用同一行虽然，特别是因为语言不允许它（python 中没有枚举）并且因为我认为python可以不那么冗长，但我不知道一个好的设计实践来达到同样的结果。

值得一提的是，我不想让模块的用户暴露于不必要的复杂性，或者强迫他知道每个信息的索引。模块的用户最好能够确定他想要的女巫信息及其顺序。

那么，您对以优雅的方式解决此要求有任何见解吗？提前致谢

score 2 · Accepted Answer

Python 已经有一个内置类型，它可以执行 FileInformation 所做的事情 - 签出slice。

这是您的模块的外观：

# module dataparser.py

INFO01, INFO02, INFO03, INFO04 = map(slice, ((0,2),(2,10),(10,15),(15,20)))

def parse(infos, data):
    return [data[info] for info in infos]

以及调用模块如何使用它：

# module dataparser_user.py

import dataparser as dp

data = """\
XX20120101NAME1CITYA
XY20120101NAME2CITYB""".splitlines()

for d in data:
    print d, dp.parse((dp.INFO01, dp.INFO03), d)

# or use partial to define a function object that takes your 
# subset number of slices
from functools import partial
specific_parse = partial(dp.parse, (dp.INFO01, dp.INFO03))

for d in data:
    print d, specific_parse(d)

如果您要在 Python 中实现自己的enum模拟，我认为namedtuple这将是最接近的（因为您的 Javaenum有 getter 但没有 setter - namedtuples 同样是不可变的）：

from collections import namedtuple
FileInformation = namedtuple("FileInformation", "start end")
INFO01, INFO02, INFO03, INFO04 = map(FileInformation, ((0,2),(2,10),(10,15),(15,20)))

python - 在 python 中设计一个可重用的解析器

1 回答 1

Related

Reference