3

我有几个排序列表,我想将它们一起添加到一个大的排序列表中。最有效的方法是什么?

这是我会做的,但效率太低:

big_list=[]
for slist in sorted_lists: # sorted_lists is a generator, so lists have to be added one by one
    big_list.extend(slist)
big_list.sort()

这是 sorted_lists 的示例:

sorted_lists 的大小 =200

sorted_lists=1668 第一个元素的大小

sorted_lists=[
['000008.htm_181_0040_0009', '000008.htm_181_0040_0037', '000008.htm_201_0041_0031', '000008.htm_213_0029_0004', '000008.htm_263_0015_0011', '000018.htm_116_0071_0002', '000018.htm_147_0046_0002', '000018.htm_153_0038_0015', '000018.htm_160_0060_0001', '000018.htm_205_0016_0002', '000031.htm_4_0003_0001', '000032.htm_4_0003_0001', '000065.htm_5_0013_0005', '000065.htm_8_0008_0006', '000065.htm_14_0038_0036', '000065.htm_127_0016_0006', '000065.htm_168_0111_0056', '000072.htm_97_0016_0012', '000072.htm_175_0028_0020', '000072.htm_188_0035_0004'….],
['000018.htm_68_0039_0030', '000018.htm_173_0038_0029', '000018.htm_179_0042_0040', '000018.htm_180_0054_0021', '000018.htm_180_0054_0031', '000018.htm_182_0025_0023', '000018.htm_191_0041_0010', '000065.htm_5_0013_0007', '000072.htm_11_0008_0002', '000072.htm_14_0015_0002', '000072.htm_75_0040_0021', '000079.htm_11_0005_0000', '000079.htm_14_0006_0000', '000079.htm_16_0054_0006', '000079.htm_61_0018_0012', '000079.htm_154_0027_0011', '000086.htm_8_0003_0000', '000086.htm_9_0030_0005', '000086.htm_11_0038_0004', '000086.htm_34_0031_0024'….],
['000001.htm_13_0037_0004', '000008.htm_48_0025_0006', '000008.htm_68_0025_0008', '000008.htm_73_0024_0014', '000008.htm_122_0034_0026', '000008.htm_124_0016_0005', '000008.htm_144_0046_0030', '000059.htm_99_0022_0012', '000065.htm_69_0045_0017', '000065.htm_383_0026_0020', '000072.htm_164_0030_0002', '000079.htm_122_0030_0009', '000079.htm_123_0049_0015', '000086.htm_13_0037_0004', '000109.htm_71_0054_0029', '000109.htm_73_0035_0005', '000109.htm_75_0018_0004', '000109.htm_76_0027_0013', '000109.htm_101_0030_0008', '000109.htm_134_0036_0030']]

编辑

谢谢你的回答。我想我应该更清楚地说明我没有模拟排序的列表,但我正在迭代一些大文件来获取它们。因此,我需要一一添加它们,正如我在上面的粗略代码中所示。

4

3 回答 3

6

标准库heapq.merge为此目的提供:

>>> a=[1,3,5,6]
>>> b=[2,4,6,8]
>>> c=[2.5,4.5]
>>> list(heapq.merge(a,b,c))
[1, 2, 2.5, 3, 4, 4.5, 5, 6, 6, 8]
>>> 

或者,在您的情况下:

big_list = list(heapq.merge(*sorted_lists))

请注意,您不必创建列表,因为heapq.merge返回一个可迭代的:

for item in heapq.merge(*sorted_lists):

引用文档:

类似于sorted(itertools.chain(*iterables))但返回一个可迭代对象,不会一次将数据全部拉入内存,并假设每个输入流都已排序(从最小到最大)。

于 2013-10-30T15:56:53.983 回答
3

使用该heapq模块跟踪从哪个列表中选择下一个排序值:

import heapq

def merge(*iterables):
    h = []
    for it in map(iter, iterables):
        try:
            next = it.next
            h.append([next(), next])
        except StopIteration:
            pass
    heapq.heapify(h)

    while True:
        try:
            while True:
                v, next = s = h[0]
                yield v
                s[0] = next()
                heapq._siftup(h, 0)
        except StopIteration:
            heapq.heappop(h)
        except IndexError:
            return

这会将所有列表推送到堆中,并按它们的下一个值排序。每次这产生最低值时,堆都会使用使用的迭代器中的下一个值更新,并再次重新排序堆。

这实质上保留了一个列表[next_value, iterable]列表,并且这些列表按next_value.

用法:

for value in merge(*sorted_lists):
    # loops over all values in `sorted_lists` in sorted order

或者

big_list = list(merge(*sorted_lists))

高效地创建一个新的大列表,其中所有值都已排序。

这个确切的实现作为函数添加到heapq模块中,所以你可以这样做:heapq.merge()

from heapq import merge

big_list = list(merge(*sorted_lists))
于 2013-10-30T15:34:10.570 回答
0
def merge_lists(*args):
   new_list = sorted(list(heapq.merge(*args)))
   print(new_list)
于 2022-02-14T05:27:48.330 回答