python - 深度检查两个python字典并获取报告形式的差异

Question

说python中有两个字典 -

字典1

mydict1 = { 
        "Person" :
            {
                "FName"    : "Rakesh",
                "LName"    : "Roshan",
                "Gender"   : "Male",
                "Status"   : "Married",
                "Age"      : "60",
                "Children" :
                    [
                        {
                            "Fname"    : "Hrithik",
                            "Lname"    : "Roshan",
                            "Gender"   : "Male",
                            "Status"   : "Married",
                            "Children" : ["Akram", "Kamal"],
                        },
                        {
                            "Fname"    : "Pinky",
                            "Lname"    : "Roshan",
                            "Gender"   : "Female",
                            "Status"   : "Married",
                            "Children" : ["Suzan", "Tina", "Parveen"]
                        }
                    ],
                "Movies" : 
                    {
                        "The Last Day" :
                            {
                                "Year" : 1990,
                                "Director" : "Mr. Kapoor"
                            },
                        "Monster" :
                            {
                                "Year" : 1991,
                                "Director" : "Mr. Khanna"
                            }
                    }
             }
    }

字典2

mydict2 = {
        "Person" :
            {
                "FName"    : "Rakesh",
                "LName"    : "Roshan",
                "Gender"   : "Male",
                "Status"   : "Married",
                "Children" :
                    [
                        {
                            "Fname"    : "Hrithik",
                            "Lname"    : "Losan",
                            "Gender"   : "Male",
                            "Status"   : "Married",
                            "Children" : ["Akram", "Ajamal"],
                        },
                        {
                            "Fname"    : "Pinky",
                            "Lname"    : "Roshan",
                            "Gender"   : "Female",
                            "Status"   : "Married",
                            "Children" : ["Suzan", "Tina"]
                        }
                    ]
             }
    }

我想比较两个字典并打印报告格式的差异，如下所示 -

MISMATCH 1
==========
MATCH DICT KEY : Person >> Children >> LName
EXPECTED  : Roshan
ACUTAL    : Losan


MISMATCH 2
==========
MATCH LIST ITEM : Person >> Children >> Children
EXPECTED        : Kamal
ACTUAL          : Ajamal


MISMATCH 3
==========
MATCH LIST ITEM : Person >> Children >> Children
EXPECTED        : Parveen
ACTUAL          : NOT_FOUND

MISMATCH 4
==========
MATCH DICT KEY  : Person >> Age
EXPECTED        : 60
ACTUAL          : NOT_FOUND 

MISMATCH 5
==========
MATCH DICT KEY  : Person >> Movies
EXPECTED        : { Movies : {<COMPLETE DICT>} } 
ACTUAL          : NOT_FOUND

我尝试使用名为 datadiff 的 Python 模块，它没有给我一个字典格式的漂亮输出。要生成报告，我必须遍历字典并找到“+”“-”键。如果字典太复杂，则很难遍历。

score 6 · Accepted Answer

更新：我更新了代码以更合适的方式处理列表。如果您需要更改它，我还评论了代码以使其更清楚。

这个答案现在不是 100% 通用的，但它可以很容易地扩展以满足您的需要。

def print_error(exp, act, path=[]):
    if path != []:
        print 'MATCH LIST ITEM: %s' % '>>'.join(path)
    print 'EXPECTED: %s' % str(exp)
    print 'ACTUAL: %s' % str(act)
    print ''

def copy_append(lst, item):
    foo = lst[:]
    foo.append(str(item))
    return foo

def deep_check(comp, compto, path=[], print_errors=True):
    # Total number of errors found, is needed for when
    # testing the similarity of dicts
    errors = 0

    if isinstance(comp, list):
        # If the types are not the same then it is probably a critical error
        # return a number to represent how important this is
        if not isinstance(compto, list):
            if print_errors:
                print_error(comp, 'NOT_LIST', path)
            return 1

        # We don't want to destroy the original lists
        comp_copy = comp[:]
        compto_copy = compto[:]

        # Remove items that are both is comp and compto
        # and find items that are only in comp
        for item in comp_copy[:]:
            try:
                compto_copy.remove(item)
                # Only is removed if the item is in compto_copy
                comp_copy.remove(item)
            except ValueError:
                # dicts need to be handled differently 
                if isinstance(item, dict):
                    continue
                if print_errors:
                    print_error(item, 'NOT_FOUND', path)
                errors += 1

        # Find non-dicts that are only in compto
        for item in compto_copy[:]:
            if isinstance(item, dict):
                continue
            compto_copy.remove(item)
            if print_errors:
                print_error('NOT_FOUND', item, path)
            errors += 1

        # Now both copies only have dicts

        # This is the part that compares dicts with the minimum
        # errors between them, it is expensive since each dict in comp_copy
        # has to be compared against each dict in compto_copy
        for c in comp_copy:
            lowest_errors = None
            lowest_value = None
            for ct in compto_copy:
                errors_in = deep_check(c, ct, path, print_errors=False)

                # Get and store the minimum errors
                if errors_in < lowest_errors or lowest_errors is None:
                    lowest_errors = errors_in
                    lowest_value = ct
            if lowest_errors is not None:
                errors += lowest_errors
                # Has to have print_errors passed incase the list of dicts
                # contains a list of dicts
                deep_check(c, lowest_value, path, print_errors)
                compto_copy.remove(lowest_value)

        return errors

    if not isinstance(compto, dict):
        # If the types are not the same then it is probably a critical error
        # return a number to represent how important this is
        if print_errors:
            print_error(comp, 'NOT_DICT')
        return 1
    for key, value in compto.iteritems():
        try:
            comp[key]
        except KeyError:
            if print_errors:
                print_error('NO_KEY', key, copy_append(path, key))
            errors += 1

    for key, value in comp.iteritems():
        try:
            tovalue = compto[key]
        except KeyError:
            if print_errors:
                print_error(value, 'NOT_FOUND', copy_append(path, key))
            errors += 1
            continue

        if isinstance(value, (list, dict)):
            errors += deep_check(value, tovalue, copy_append(path, key), print_errors)
        else:
            if value != tovalue:
                if print_errors:
                    print_error(value, tovalue, copy_append(path, key))
                errors += 1

    return errors

以您的 dicts 作为输入，我得到：

MATCH LIST ITEM: Person>>Age
EXPECTED: 60
ACTUAL: NOT_FOUND

MATCH LIST ITEM: Person>>Movies
EXPECTED: {'The Last Day': {'Director': 'Mr. Kapoor', 'Year': 1990}, 'Monster': {'Director': 'Mr. Khanna', 'Year': 1991}}
ACTUAL: NOT_FOUND

MATCH LIST ITEM: Person>>Children>>Lname
EXPECTED: Roshan
ACTUAL: Losan

MATCH LIST ITEM: Person>>Children>>Children
EXPECTED: Kamal
ACTUAL: NOT_FOUND

MATCH LIST ITEM: Person>>Children>>Children
EXPECTED: NOT_FOUND
ACTUAL: Ajamal

MATCH LIST ITEM: Person>>Children>>Children
EXPECTED: Parveen
ACTUAL: NOT_FOUND

比较列表的方式已更新，因此这两个列表：

['foo', 'bar']
['foo', 'bing', 'bar']

只会引发关于 'bing' 不在第一个列表中的错误。对于字符串值，该值可以在列表中也可以不在列表中，但是当您比较字典列表时会出现问题。您最终会得到列表中在不同程度上不匹配的字典，并且知道要从这些字典中比较哪些字典并不是一件容易的事。

我的实现通过假设创建最少错误数量的字典对是需要一起比较的字典来解决这个问题。例如：

test1 = {
        "Name": "Org Name",
        "Members":
        [
            {
                "Fname": "foo",
                "Lname": "bar",
                "Gender": "Neuter",
                "Roles": ["President", "Vice President"]
                },
            {
                "Fname": "bing",
                "Lname": "bang",
                "Gender": "Neuter",
                "Roles": ["President", "Vice President"]
                }
            ]
        }

test2 = {
        "Name": "Org Name",
        "Members":
        [
            {
                "Fname": "bing",
                "Lname": "bang",
                "Gender": "Male",
                "Roles": ["President", "Vice President"]
                },
            {
                "Fname": "foo",
                "Lname": "bar",
                "Gender": "Female",
                "Roles": ["President", "Vice President"]
                }
            ]
        }

产生这个输出：

MATCH LIST ITEM: Members>>Gender
EXPECTED: Neuter
ACTUAL: Female

MATCH LIST ITEM: Members>>Gender
EXPECTED: Neuter
ACTUAL: Male

python - 深度检查两个python字典并获取报告形式的差异

字典1

字典2

1 回答 1

Related

Reference