python - 使用不同数量的内核进行多处理时，将酸洗结果保存到磁盘会导致不同大小的对象

Question

我环顾四周，但无法找到这个问题的答案。我相当确定我的代码很好，但我注意到当我运行它并将腌制结果（字典）保存到磁盘时，它的大小根据我使用的内核数量而有所不同。

Using 4 cores results in a file 48,418 KB
Using 8 cores (hyperthreading) results in a file 59,880 KB

结果应该（并且似乎）是相同的，所以我只是好奇是什么导致了尺寸差异。

我对两个腌制对象进行了快速总结，它们都报告了每个字典中相同数量的项目：

4 cores has 683 keys and 6,015,648 values
8 cores has 683 keys and 6,015,648 values

我想我可以检查每个键的值是否完全相同，但我认为这可能需要相当长的时间才能运行。

可能导致这种情况的唯一代码必须是将数据拆分为要处理的块的位置，这些代码是：

def split_list_multi(listOfLetterCombos,threads=8):
    """Split a list into N parts for use with multiprocessing module. Takes a list(or set)
    which should be the various letter combinations created using make_letter_combinations().
    Divides the list into N (where n is the number of threads) equal parts and returns a dict
    where the key is the thread number and the value is a slice of the list.
    With 4 threads and a list of 2000 items, the results dict would be {'1': [0:500],
    '2': [500:1000], '3': [1000:1500], '4': [1500,2000]} and the number of threads."""
    fullLength = len(listOfLetterCombos)
    single = math.floor(fullLength/threads)
    results = {}
    counter = 0
    while counter < threads:
        if counter == (threads-1):
            results[str(counter)] = listOfLetterCombos[single*counter::]
        else:
            results[str(counter)] = listOfLetterCombos[single*counter:single*(counter+1)]
        counter += 1
    return results,threads


def main(numOfLetters,numThreads):
    wordList = pickle.load( open( r'd:\download\allwords.pickle', 'rb'))
    combos = make_letter_combinations(numOfLetters)
    split = split_list_multi(combos,numThreads)
    doneQueue = multiprocessing.Queue()
    jobs = []
    startTime = time.time()
    for num in range(split[1]):
        listLetters = split[0][str(num)] 
        thread = multiprocessing.Process(target=worker, args=(listLetters,wordList,doneQueue))
        jobs.append(thread)
        thread.start()

    resultdict = {}
    for i in range(split[1]):
        resultdict.update(doneQueue.get())

    for j in jobs:
        j.join()

    pickle.dump( resultdict, open( 'd:\\download\\results{}letters.pickle'.format(numOfLetters), "wb" ) )
    endTime = time.time()
    totalTime = (endTime-startTime)/60
    print("Took {} minutes".format(totalTime))
    return resultdict

score 2 · Accepted Answer

来自： cPickle - 酸洗同一对象的不同结果 cPickle - 酸洗同一对象的不同结果

“无法保证看似相同的物体会产生相同的泡菜串。

pickle 协议是一个虚拟机，pickle 字符串是该虚拟机的程序。对于给定的对象，存在多个pickle字符串（=程序），它们将准确地重建该对象。”

谈论泡菜的小菜！

python - 使用不同数量的内核进行多处理时，将酸洗结果保存到磁盘会导致不同大小的对象

1 回答 1

Related

Reference