python - 在 Python 函数中使用大数据结构时的效率

Question

我需要使用大数据结构，更具体地说，是一个大字典来完成查找工作。

一开始我的代码是这样的：

#build the dictionary
blablabla
#look up some information in the ditionary
blablabla

由于我需要多次查找，我开始意识到将它实现为一个函数是一个好主意，比如lookup(info)。

那么问题来了，大字典该怎么处理呢？

我应该使用lookup(info, dictionary)将其作为参数传递，还是应该在main()中初始化字典并将其用作全局变量？

第一个似乎更优雅，因为我认为维护全局变量很麻烦。但另一方面，我不确定将大字典传递给函数的效率。它会被多次调用，如果参数传递效率低下，那肯定是一场噩梦。

谢谢。

编辑1：

我只是对以上两种方式做了一个实验：

这是代码片段。lookup1实现参数传递查找，而 lookup2 使用全局数据结构“big_dict”。

class CityDict():
    def __init__():
        self.code_dict = get_code_dict()
    def get_city(city):
        try:
            return self.code_dict[city]
        except Exception:
            return None         

def get_code_dict():
    # initiate code dictionary from file
    return code_dict

def lookup1(city, city_code_dict):
    try:
        return city_code_dict[city]
    except Exception:
        return None

def lookup2(city):
    try:
        return big_dict[city]
    except Exception:
        return None


t = time.time()
d = get_code_dict()
for i in range(0, 1000000):
    lookup1(random.randint(0, 10000), d)

print "lookup1 is %f" % (time.time() - t)


t = time.time()
big_dict = get_code_dict()
for i in range(0, 1000000):
    lookup2(random.randint(0, 1000))
print "lookup2 is %f" % (time.time() - t)


t = time.time()
cd = CityDict() 
for i in range(0, 1000000):
    cd.get_city(str(i))
print "class is %f" % (time.time() - t)

这是输出：

lookup1 是8.410885
lookup2 是8.157661
类是4.525721

所以看起来这两种方式几乎是一样的，而且是的，全局变量的方式效率更高一点。

编辑2：

添加了Amber建议的class版本，然后再次测试效率。然后我们可以从结果中看出 Amber 是对的，我们应该使用类版本。

score 8 · Accepted Answer

回答核心问题，参数传递并不是低效的，你的值不会被复制。Python 传递引用，这并不是说传递参数的方式符合众所周知的“按值传递”或“按引用传递”的方案。

最好将其想象为使用调用者提供的引用值来初始化被调用函数的局部变量的值，这些引用值是按值传递的。

不过，使用类的建议可能是个好主意。

score 5 · Accepted Answer

两者都不。使用一个类，该类专门用于将函数（方法）与数据（成员）分组：

class BigDictLookup(object):
    def __init__(self):
        self.bigdict = build_big_dict() # or some other means of generating it
    def lookup(self):
        # do something with self.bigdict

def main():
    my_bigdict = BigDictLookup()
    # ...
    my_bigdict.lookup()
    # ...
    my_bigdict.lookup()

python - 在 Python 函数中使用大数据结构时的效率

2 回答 2

Related

Reference