python - 返回列表子集的最有效方法是什么

Question

我正在解析已放入列表列表中的大量二进制数据：

row = [1,2,3...]                # list of many numbers
data = [row1,row2,row3...]      # a list of many rows

list_of_indices = [1,5,13,7...] # random list of indices. Always shorter than row
                                #This list won't change after creation

我想返回仅包含以下元素的行list_of_indices：

subset_row = [row(index) for index in list_of_indices]

我的问题：

将subset_row包含返回的每个元素的副本（即，将subset_row是内存中的全新列表）或subset_row包含对原始数据的引用。请注意，数据不会被修改，所以我认为它甚至可能无关紧要..

另外，有没有更有效的方法来做到这一点？我将不得不迭代数千行..

这在此处有所介绍，但就返回的内容而言还不够具体。根据索引列表返回子列表的最简单和最有效的函数是什么？

score 1 · Accepted Answer

首先应该是

[row[index] for index in list_of_indexes]

（或只是map(list_of_indexes.__getitem__, row)）

其次，在 Python 中没有办法拥有一个对象的引用/指针；或者，换句话说，一切都已经是一个参考。所以这意味着，实际上，在ints 的情况下，基本上没有区别；在更多“重量级”对象的情况下，您会自动获得引用，因为在 Python 中不会隐式复制任何内容。

注意：如果您row包含大量数据，并且list_of_indexes列表也很长，您可能需要考虑惰性求值（也称为 Python 中的生成器和生成器表达式）：

subset_row = (row[index] for index in list_of_indexes)

现在您可以迭代subset_row而不必评估/读取内存中序列中的所有值，或者您可以使用以下方法一个一个地使用该序列：

first = next(subset_row)
second = next(subset_row)
# etc

此外，由于您还提到了“列表列表”并且data = [row1, row2, ...]在您的代码示例中有，我怀疑您可能希望同时将该操作应用于多个列表：

indices = [3, 7, 123, ...]
data = [<row1>, <row2>, ...]
rows = [[row[i] for i in indices] for row in data]

或者对于外部列表的懒惰：

rows = ([row[i] for i in indices] for row in data)

或两者兼而有之：

row = ((row[i] for i in indices) for row in data)

python - 返回列表子集的最有效方法是什么

1 回答 1

Related

Reference