1

我有一个 python 列表,其中包含 80000 个列表。这些内部列表中的每一个或多或少都具有以下格式:

["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20]

你能说出这个由 80000 个列表组成的列表大约会消耗多少内存吗?

在 python 中使用和操作这么大的列表是否常见/可以?我做的大部分操作是用列表理解方法从这个列表中提取数据。

实际上,我想学习的是:python 是否足够快,可以使用列表理解方法从大列表中提取数据。我希望我的脚本快

4

5 回答 5

3
In [39]: lis=["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20,
     20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20]

In [40]: k=[lis[:] for _ in xrange(80000)]

In [41]: k.__sizeof__()
Out[41]: 325664

In [42]: sys.getsizeof(k)  #after gc_head
Out[42]: 325676

根据其中的代码,sysmodule.c它看起来像是调用__sizeof__方法来获取对象的大小。

   837   method = _PyObject_LookupSpecial(o, &PyId___sizeof__);   
   838     if (method == NULL) {
   839         if (!PyErr_Occurred())
   840             PyErr_Format(PyExc_TypeError,
   841                          "Type %.100s doesn't define __sizeof__",
   842                          Py_TYPE(o)->tp_name);
   843     }
   844     else {
   845         res = PyObject_CallFunctionObjArgs(method, NULL);
   846         Py_DECREF(method);
   847     }

然后增加一些gc开销:

   860     /* add gc_head size */
   861     if (PyObject_IS_GC(o)) {
   862         PyObject *tmp = res;
   863         res = PyNumber_Add(tmp, gc_head_size);
   864         Py_DECREF(tmp);
   865     }
   866     return res;
   867 }

我们还可以使用docsrecursive sizeof recipe中建议的方法递归计算每个容器的大小:

In [17]: total_size(k)  #from recursive sizeof recipe
Out[17]: 13125767

In [18]: sum(y.__sizeof__() for x in k for y in x)
Out[18]: 34160000
于 2013-01-21T21:18:02.230 回答
3

在我使用 32 位 Python 2.7.3 的机器上,包含问题中确切列表的 80K 副本的列表大约需要 10MB。这是通过比较两个相同的解释器的内存占用量来衡量的,一个有列表,一个没有。

我试过用 测量尺寸sys.getsizeof(),但返回的结果明显不正确:

>>> l=[["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20] for i in range(80000)]
>>> sys.getsizeof(l)
325680
于 2013-01-21T21:18:14.257 回答
1

应用Python 对象大小(修订版)配方中的当前(第 13 版)代码并放置在名为 的模块sizeof中,然后将其应用到您的示例列表中,结果如下(使用 32 位 Python 2.7.3):

from sizeof import asizeof  # from http://code.activestate.com/recipes/546530

MB = 1024*1024
COPIES = 80000
lis=["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20,
     20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20]

lis_size = asizeof(lis)
print 'asizeof(lis): {} bytes'.format(lis_size)
list_of_lis_size = asizeof([lis[:] for _ in xrange(COPIES)])
print 'asizeof(list of {:,d} copies of lis): {:,d} bytes ({:.2f} MB)'.format(
                         COPIES, list_of_lis_size, list_of_lis_size/float(MB))
asizeof(lis): 272 bytes
asizeof(list of 80,000 copies of lis): 13,765,784 bytes (13.13 MB)
于 2013-01-21T22:55:07.603 回答
1

sys.getsizeof: (object, default)
│ │ getsizeof(object, default) -> int
│ │<br> │ │ 以字节为单位返回对象的大小。

代码

>> import sys
>> sys.getsizeof(["012345", "MYNAME" "Mon", "A", 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20])
>> 160

它为您的列表返回160字节。将其乘以大约 80,000 或 12.8 MB。(使用 Python 2.7.2、Python 3.2 的 32 位机器)

于 2013-01-21T21:20:55.967 回答
0

请注意以下与解释器的交互:

>>> import sys
>>> array = ['this', 'is', 'a', 'string', 'array']
>>> sys.getsizeof(array)
56
>>> list(map(sys.getsizeof, array))
[29, 27, 26, 31, 30]
>>> sys.getsizeof(array) + sum(map(sys.getsizeof, array))
199
>>> 

在这种特定情况下,答案是用于sys.getsizeof(array) + sum(map(sys.getsizeof, array))查找字符串列表的大小。但是,以下将是一个更完整的实现,它考虑了对象容器、类和 __slots__ 的用法。

import sys

def sizeof(obj):
    return _sizeof(obj, set())

def _sizeof(obj, memo):
    # Add this object's size just once.
    location = id(obj)
    if location in memo:
        return 0
    memo.add(location)
    total = sys.getsizeof(obj)
    # Look for any class instance data.
    try:
        obj = vars(obj)
    except TypeError:
        pass
    # Handle containers holding objects.
    if isinstance(obj, (tuple, list, frozenset, set)):
        for item in obj:
            total += _sizeof(item, memo)
    # Handle the two-sided nature of dicts.
    elif isinstance(obj, dict):
        for key, value in dict.items():
            total += _sizeof(key, memo) + _sizeof(value, memo)
    # Handle class instances using __slots__.
    elif hasattr(obj, '__slots__'):
        for key, value in ((name, getattr(obj, name))
            for name in obj.__slots__ if hasattr(obj, name)):
            total += _sizeof(key, memo) + _sizeof(value, memo)
    return total

编辑:

一段时间后解决这个问题后,设计了以下替代方案。请注意,它不适用于无限迭代器。此代码最适合用于分析的静态数据结构。

import sys

sizeof = lambda obj: sum(map(sys.getsizeof, explore(obj, set())))

def explore(obj, memo):
    loc = id(obj)
    if loc not in memo:
        memo.add(loc)
        yield obj
        # Handle instances with slots.
        try:
            slots = obj.__slots__
        except AttributeError:
            pass
        else:
            for name in slots:
                try:
                    attr = getattr(obj, name)
                except AttributeError:
                    pass
                else:
                    yield from explore(attr, memo)
        # Handle instances with dict.
        try:
            attrs = obj.__dict__
        except AttributeError:
            pass
        else:
            yield from explore(attrs, memo)
        # Handle dicts or iterables.
        for name in 'keys', 'values', '__iter__':
            try:
                attr = getattr(obj, name)
            except AttributeError:
                pass
            else:
                for item in attr():
                    yield from explore(item, memo)
于 2013-01-21T22:48:26.297 回答