3

我有一个应用程序可以根据一些设定的规则验证 CSV 文件。应用程序检查 CVS 中的某些“列/字段”是否被标记为必填,其他的则检查其必填状态是否基于另一个字段。例如,第 2 列对第 5 列进行条件检查,如果第 5 列有值,则第 2 列也必须有值。

我已经使用 VB 和 Python 实现了这一点。问题是这个逻辑在应用程序中是硬编码的。我想要的是将此规则移动到一个 XML 中,应用程序将在其中读取该 XML 并处理该文件。如果处理规则发生变化——而且它们经常变化——那么应用程序保持不变,只有 XML 发生变化。

以下是python中的两个示例规则:

样品一

current_column_data = 5 #data from the current position in the CSV
if validate_data_type(current_column_data, expected_data_type) == False:
    return error_message
index_to_check_against = 10 #Column against which there is a "logical" test
text_to_check = get_text(index_to_check_against)
if validate_data_type(text_to_check, expected_data_type) == False:
    return error_message
if current_column_data > 10:    #This test could be checking String Vs String so have to keep in mind that to avoid errors since current column data could be a string value
    if text_to_check <= 0:
        return "Text to check should be greater than 0 if current column data is greater than 10 "

样品二

current_column_data = "Self Employed" #data from the current position in the CSV
if validate_data_type(current_column_data, expected_data_type) == False:
    return error_message
index_to_check_against = 10 #Column against which there is a "logical" test
text_to_check = get_text(index_to_check_against)
if validate_data_type(text_to_check, expected_data_type) == False:
    return error_message
if text_to_check == "A":    #Here we expect if A is provided in the index to check, then current column should have a value hence we raise an error message
    if len(current_column_data) = 0:
        return "Current column is mandatory given that "A" is provided in Column_to_check""

注意:对于 CSV 中的每一列,我们已经知道预期的数据类型,该字段的预期长度,是强制的、可选的还是条件的,如果它是条件的,则条件基于的另一列

现在我只需要一些关于如何在 XML 中执行此操作的指导,并且应用程序读取 XML 并知道如何处理每一列。

有人在其他地方建议了以下示例,但我仍然无法理解这个概念。:

<check left="" right="9" operation="GTE" value="3" error_message="logical failure for something" /> 
#Meaning: Column 9 should be "GTE" i.e. Greater than or equal two value 3"

有没有不同的方法来实现这种逻辑,甚至是改进我在这里所拥有的东西?

欢迎提出建议和指点

4

2 回答 2

2

This concept is called a Domain Specific Language (DSL) - you are effectively creating a mini-programming language for validating your CSV files. Your DSL allows you to express succinctly the rules for a valid CSV file.

This DSL could be expressed using XML, or an alternative approach would be to develop a library of functions in python instead. Then your DSL could be expressed as a mini-python program which is a sequence of these functions. This approach is called an in-language or "internal" DSL - and has the benefit that you have the full power of python at your disposal within your language.

Looking at your samples - you're very close to this already. When I read them, they're almost like an English description of the CSV validation rules.

Don't feel you have to go down the XML route - there's nothing wrong with keeping everything in Python

  • You can split your code, so you have a python file with the "CSV validation rules" expressed in your DSL, which your need to update/redistribute frequently, and separate files which define your DSL functions, which will change less frequently
  • In some cases it's even possible to develop the DSL to the point where non-programmers can update/maintain "programs" written in it
于 2013-07-18T10:30:25.577 回答
0

您正在解决的问题不一定与 XML 相关。好的,您可以使用 XSD 对 XML 进行验证,但这意味着您的数据需要是 XML,我不确定您是否可以做到“如果 A > 3,则适用以下规则”。

比罗斯回答的稍微不那么优雅但可能更简单的方法是将规则集定义为数据并让特定的函数处理它们,这基本上就是您的 XML 示例所做的,使用 XML 存储(即序列化)数据---但是您可以使用任何其他序列化格式,例如 JSON、YAML、INI 甚至 CSV(并不是建议这样做)。

所以你可以专注于规则的数据模型。我将尝试用 XML 来说明这一点(但不使用属性):

<cond name="some explanatory name">
    <if><expr>...</expr>
    <and>
        <expr>
            <left><column>9</column></left>
            <op>ge</op>
            <right>3</right>
        </expr>
        <expr>
            <left><column>1</column></left>
            <op>true</op>
            <right></right>
        </expr>
    </and>
</cond>

然后,您可以将其加载到 Python 并为每一行遍历它,并酌情引发很好的解释性异常。

编辑: 您提到该文件可能需要是人工可写的。请注意,YAML就是为此而设计的。

类似(不一样,为了更好地说明语言)结构:

# comments, explanations...
conds:
    -   name: some explanatory name
        # seen that? no quotes needed (unless you include some of
        # quite limited set of special chars)
        if:
            expr:
                # "..."
            and:
                expr:
                    left:
                            column: 9
                    op:     ge
                    right:  3
                expr:
                    left:
                            column: 1
                    op:     true
    -   name: some other explanatory name
        # i'm using alternative notation for columns below just to have
        # them better indented (not sure about support in libs)
        if:
            expr:
                # "..."
            and:
                expr:
                    left:   { column: 9 }
                    op:     ge
                    right:  3
                expr:
                    left:   { column: 1 }
                    op:     true
于 2013-07-18T10:50:00.163 回答