1

I have a program that recursively goes through 2 directories and puts the filename:sha256hash into 2 dicts, folder1 and folder2.

What I want to do is a comparison of the hashes and if the hashes match but the key is different, pub the key into a new list called "renamed". I have the logic in place to account for deleted files, new files, and files where the key is the same but the value(hash) is different (a modified file) but can't for the life of me get my head around doing the opposite.

    # Put filename:hash into 2 dictionaries from the folders to compare

    for root, dirs, files in os.walk(folder_1):
        for file in files:
            files1[file] = get_hash(os.path.join(root,file))

    for root, dirs, files in os.walk(folder_2):
        for file in files:
            files2[file] = get_hash(os.path.join(root, file))

    # Set up the operations to do for the comparison 

    set_files2, set_files1 = set(files2.keys()), set(files1.keys())
    intersect = set_files2.intersection(set_files1)

    # Compare and add to list for display

    created.extend(set_files2 - intersect)
    deleted.extend(set_files1 - intersect)
    modified.extend(set(k for k in intersect if files1[k] != files2[k]))
    unchanged.extend(set(k for k in intersect if files1[k] == files2[k]))

The issue with this is 1: it doesn't account for renamed files, 2: it puts renamed files into created, so once I have renamed files I have to created = created - renamed to filter those out of actual new files.

Any/all help is appreciated. I've come this far but for some reason my mind is on strike.

4

3 回答 3

3

你可以翻转你的files1files2听写:

name_from_hash1 =  {v:k for k, v in file1.items()}
name_from_hash2 =  {v:k for k, v in file2.items()}

(我在这个 SO 答案中找到的翻转成语。)

然后,

renamed = []
for h in name_from_hash1:
    if h in name_from_hash2 and name_from_hash1[h] != name_from_hash2[h]:
        renamed.append(name_from_hash2[h])

renamed然后是按当前名称重命名的文件名列表。您可以通过在最后一行更改name_from_hash2为来获取重命名文件的原始名称列表。name_from_hash

于 2013-06-20T17:51:21.600 回答
2

我为您提供了一个简单的解决方案:与其将文件名作为键,将散列作为值,不如将散列作为键,将文件名作为值(毕竟,您希望键是唯一的,而不是值)。您只需调整程序的其余部分即可解决此问题。(糟糕,看起来 Bitwise 已经在评论中提到了。哦,好吧。)

如果您不想更改其余代码,如果您使用的是 Python 2.7+,这里有一个很好的单行方法来创建一组重命名文件:

renamedfiles = {k for k, v in hashes1.items() if v in hashes2.values()}

为了在 Python 2.7 中略微提高效率,请改用iteritems()and itervalues()(Python 3 默认将其键、项和值视图表示为迭代器)。

附录:您也可以这样做renamedfiles = filter(lambda item:item in hashes2.values(), hashes1.items()),尽管这会导致对合格键/值对而不是集合或字典的迭代器。此外,我相信理解通常在 Python 中是首选,即使它filter()是内置方法之一。

于 2013-06-20T17:53:33.583 回答
0

这是我很快想到的一个糟糕的多项式时间解决方案,但是:

>>> d1 = {'a':1, 'b':2, 'c':3}    
>>> d2 = {'a':1, 'b':3, 'c':2}
>>> for key1 in d1:
...     for key2 in d2:
...             if d1[key1] == d2[key2] and key1 != key2:
...                     print key1, key2
... 
c b
b c

此代码打印d2其值与 key in 相同的键d1,但前提是这两个键不同。根据您将更改的键放入modified列表中的方式进行调整。

于 2013-06-20T17:40:37.480 回答